A solution to the ill-conditioning of gradient-enhanced covariance matrices for Gaussian processes

IF 2.7 3区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

International Journal for Numerical Methods in Engineering Pub Date : 2024-05-13 DOI:10.1002/nme.7498

André L. Marchildon, David W. Zingg

{"title":"A solution to the ill-conditioning of gradient-enhanced covariance matrices for Gaussian processes","authors":"André L. Marchildon, David W. Zingg","doi":"10.1002/nme.7498","DOIUrl":null,"url":null,"abstract":"<p>Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a diagonal preconditioner is applied to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. The method can be applied with any twice-differentiable kernel and when there are noisy function and gradient evaluations. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the preconditioning method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the preconditioning method converges the optimality, that is, the <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mrow>\n <mi>ℓ</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ {\\ell}_2 $$</annotation>\n </semantics></math> norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The preconditioning method is available in the open source Python library GpGradPy, which can be found at https://github.com/marchildon/gpgradpy/tree/paper_precon.</p>","PeriodicalId":13699,"journal":{"name":"International Journal for Numerical Methods in Engineering","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal for Numerical Methods in Engineering","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/nme.7498","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Gaussian processes provide probabilistic surrogates for various applications including classification, uncertainty quantification, and optimization. Using a gradient-enhanced covariance matrix can be beneficial since it provides a more accurate surrogate relative to its gradient-free counterpart. An acute problem for Gaussian processes, particularly those that use gradients, is the ill-conditioning of their covariance matrices. Several methods have been developed to address this problem for gradient-enhanced Gaussian processes but they have various drawbacks such as limiting the data that can be used, imposing a minimum distance between evaluation points in the parameter space, or constraining the hyperparameters. In this paper a diagonal preconditioner is applied to the covariance matrix along with a modest nugget to ensure that the condition number of the covariance matrix is bounded, while avoiding the drawbacks listed above. The method can be applied with any twice-differentiable kernel and when there are noisy function and gradient evaluations. Optimization results for a gradient-enhanced Bayesian optimizer with the Gaussian kernel are compared with the use of the preconditioning method, a baseline method that constrains the hyperparameters, and a rescaling method that increases the distance between evaluation points. The Bayesian optimizer with the preconditioning method converges the optimality, that is, the $ℓ_{2}$ norm of the gradient, an additional 5 to 9 orders of magnitude relative to when the baseline method is used and it does so in fewer iterations than with the rescaling method. The preconditioning method is available in the open source Python library GpGradPy, which can be found at https://github.com/marchildon/gpgradpy/tree/paper_precon.

查看原文本刊更多论文

高斯过程梯度增强协方差矩阵条件不良的解决方案

高斯过程为分类、不确定性量化和优化等各种应用提供了概率代理。使用梯度增强协方差矩阵是有益的，因为相对于无梯度协方差矩阵，它能提供更精确的代理。高斯过程，尤其是使用梯度的高斯过程面临的一个尖锐问题是协方差矩阵的条件不良。针对梯度增强高斯过程的这一问题，已经开发出了几种方法，但这些方法有各种缺点，如限制可使用的数据、在参数空间中施加评估点之间的最小距离或限制超参数。本文对协方差矩阵采用对角线预处理，同时采用适度的金块，以确保协方差矩阵的条件数是有界的，同时避免上述缺点。该方法可应用于任何二次微分核，以及存在噪声函数和梯度评估的情况。使用高斯核的梯度增强贝叶斯优化器的优化结果与使用预处理方法、约束超参数的基线方法和增加评估点之间距离的重缩放方法进行了比较。与使用基线法相比，使用预处理法的贝叶斯优化器在收敛最优性（即梯度常模）方面提高了 5 到 9 个数量级，而且迭代次数少于使用重定标法。预处理方法可在开源 Python 库 GpGradPy 中找到，网址是 https://github.com/marchildon/gpgradpy/tree/paper_precon。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal for Numerical Methods in Engineering 工程技术-工程：综合

CiteScore

5.70

自引率

6.90%

发文量

276

审稿时长

5.3 months

期刊介绍： The International Journal for Numerical Methods in Engineering publishes original papers describing significant, novel developments in numerical methods that are applicable to engineering problems. The Journal is known for welcoming contributions in a wide range of areas in computational engineering, including computational issues in model reduction, uncertainty quantification, verification and validation, inverse analysis and stochastic methods, optimisation, element technology, solution techniques and parallel computing, damage and fracture, mechanics at micro and nano-scales, low-speed fluid dynamics, fluid-structure interaction, electromagnetics, coupled diffusion phenomena, and error estimation and mesh generation. It is emphasized that this is by no means an exhaustive list, and particularly papers on multi-scale, multi-physics or multi-disciplinary problems, and on new, emerging topics are welcome.