{"title":"A solution to the ill-conditioning of gradient-enhanced covariance matrices for Gaussian processes","authors":"André L. Marchildon, David W. Zingg","doi":"10.1002/nme.7498","DOIUrl":null,"url":null,"abstract":"<p>Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a diagonal preconditioner is applied to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. The method can be applied with any twice-differentiable kernel and when there are noisy function and gradient evaluations. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the preconditioning method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the preconditioning method converges the optimality, that is, the <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mrow>\n <mi>ℓ</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ {\\ell}_2 $$</annotation>\n </semantics></math> norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The preconditioning method is available in the open source Python library GpGradPy, which can be found at https://github.com/marchildon/gpgradpy/tree/paper_precon.</p>","PeriodicalId":13699,"journal":{"name":"International Journal for Numerical Methods in Engineering","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal for Numerical Methods in Engineering","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/nme.7498","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a diagonal preconditioner is applied to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. The method can be applied with any twice-differentiable kernel and when there are noisy function and gradient evaluations. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the preconditioning method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the preconditioning method converges the optimality, that is, the norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The preconditioning method is available in the open source Python library GpGradPy, which can be found at https://github.com/marchildon/gpgradpy/tree/paper_precon.
期刊介绍:
The International Journal for Numerical Methods in Engineering publishes original papers describing significant, novel developments in numerical methods that are applicable to engineering problems.
The Journal is known for welcoming contributions in a wide range of areas in computational engineering, including computational issues in model reduction, uncertainty quantification, verification and validation, inverse analysis and stochastic methods, optimisation, element technology, solution techniques and parallel computing, damage and fracture, mechanics at micro and nano-scales, low-speed fluid dynamics, fluid-structure interaction, electromagnetics, coupled diffusion phenomena, and error estimation and mesh generation. It is emphasized that this is by no means an exhaustive list, and particularly papers on multi-scale, multi-physics or multi-disciplinary problems, and on new, emerging topics are welcome.