Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization
Aurélien Pion, Emmanuel Vazquez
- Published
- May 19, 2026 — 17:32 UTC
Problem
This paper addresses the gap in the calibration of Gaussian process (GP) predictive distributions specifically for Bayesian optimization (BO) in the context of lower-tail performance. The authors highlight that traditional GP models often suffer from miscalibration, particularly below a threshold value ( t ), which adversely affects the exploration-exploitation trade-off in BO. This issue is particularly critical when using sampling criteria like expected improvement (EI), which rely on accurate lower-tail predictions. The work is presented as a preprint and has not yet undergone peer review.
Method
The authors introduce a framework for goal-oriented calibration of GP predictive distributions below a specified threshold ( t ). This framework is built on two key concepts: occurrence calibration across the design space and thresholded ( \mu )-calibration on sublevel sets defined as ( {x \in \mathbb{X}, f(x) \le t} ). The proposed method, termed tcGP, is a post-hoc calibration technique that adjusts the GP predictive distributions to enhance reliability in the lower tail. The authors utilize maximum likelihood for hyperparameter selection in standard GP models. The calibration process is designed to maintain density in the design space, ensuring that the optimization algorithm remains effective. The paper does not disclose specific training compute requirements.
Results
The experimental results demonstrate that tcGP significantly improves lower-tail calibration and BO performance compared to standard GP models and globally calibrated GP models. The authors report that tcGP achieves a lower mean squared error (MSE) in lower-tail predictions, outperforming the baselines on standard benchmarks. While specific numerical results are not detailed in the abstract, the improvements in calibration and optimization efficacy suggest substantial effect sizes, indicating that the proposed method effectively addresses the miscalibration issue.
Limitations
The authors acknowledge that their approach is limited to the noiseless setting, which may not generalize to scenarios with noise in the objective function. Additionally, the reliance on maximum likelihood for hyperparameter selection may not capture the full complexity of the underlying function, potentially leading to suboptimal performance in certain cases. The paper does not explore the scalability of tcGP in high-dimensional spaces or its performance in dynamic optimization scenarios.
Why it matters
This work has significant implications for the field of Bayesian optimization, particularly in applications where accurate lower-tail predictions are critical, such as in safety-critical systems or resource-constrained environments. By improving the calibration of GP predictive distributions, tcGP enhances the reliability of BO algorithms, potentially leading to more efficient optimization processes. This advancement opens avenues for further research into calibration techniques and their integration into various optimization frameworks, particularly in settings where the cost of evaluations is high.
Authors: Aurélien Pion, Emmanuel Vazquez
Source: arXiv:2605.20145
URL: https://arxiv.org/abs/2605.20145v1
By Turing Wire editorial staff · May 19, 2026 · Editorial standards →
Source: arXiv cs.LG