Response probability distribution estimation of expensive computer simulators: A Bayesian active learning perspective using Gaussian process regression
Chao Dang, Marcos A. Valdebenito, Nataly A. Manque, Jun Xu, Matthias G. R. Faes
{"title":"Response probability distribution estimation of expensive computer simulators: A Bayesian active learning perspective using Gaussian process regression","authors":"Chao Dang, Marcos A. Valdebenito, Nataly A. Manque, Jun Xu, Matthias G. R. Faes","doi":"arxiv-2409.00407","DOIUrl":null,"url":null,"abstract":"Estimation of the response probability distributions of computer simulators\nin the presence of randomness is a crucial task in many fields. However,\nachieving this task with guaranteed accuracy remains an open computational\nchallenge, especially for expensive-to-evaluate computer simulators. In this\nwork, a Bayesian active learning perspective is presented to address the\nchallenge, which is based on the use of the Gaussian process (GP) regression.\nFirst, estimation of the response probability distributions is conceptually\ninterpreted as a Bayesian inference problem, as opposed to frequentist\ninference. This interpretation provides several important benefits: (1) it\nquantifies and propagates discretization error probabilistically; (2) it\nincorporates prior knowledge of the computer simulator, and (3) it enables the\neffective reduction of numerical uncertainty in the solution to a prescribed\nlevel. The conceptual Bayesian idea is then realized by using the GP\nregression, where we derive the posterior statistics of the response\nprobability distributions in semi-analytical form and also provide a numerical\nsolution scheme. Based on the practical Bayesian approach, a Bayesian active\nlearning (BAL) method is further proposed for estimating the response\nprobability distributions. In this context, the key contribution lies in the\ndevelopment of two crucial components for active learning, i.e., stopping\ncriterion and learning function, by taking advantage of posterior statistics.\nIt is empirically demonstrated by five numerical examples that the proposed BAL\nmethod can efficiently estimate the response probability distributions with\ndesired accuracy.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Estimation of the response probability distributions of computer simulators
in the presence of randomness is a crucial task in many fields. However,
achieving this task with guaranteed accuracy remains an open computational
challenge, especially for expensive-to-evaluate computer simulators. In this
work, a Bayesian active learning perspective is presented to address the
challenge, which is based on the use of the Gaussian process (GP) regression.
First, estimation of the response probability distributions is conceptually
interpreted as a Bayesian inference problem, as opposed to frequentist
inference. This interpretation provides several important benefits: (1) it
quantifies and propagates discretization error probabilistically; (2) it
incorporates prior knowledge of the computer simulator, and (3) it enables the
effective reduction of numerical uncertainty in the solution to a prescribed
level. The conceptual Bayesian idea is then realized by using the GP
regression, where we derive the posterior statistics of the response
probability distributions in semi-analytical form and also provide a numerical
solution scheme. Based on the practical Bayesian approach, a Bayesian active
learning (BAL) method is further proposed for estimating the response
probability distributions. In this context, the key contribution lies in the
development of two crucial components for active learning, i.e., stopping
criterion and learning function, by taking advantage of posterior statistics.
It is empirically demonstrated by five numerical examples that the proposed BAL
method can efficiently estimate the response probability distributions with
desired accuracy.
在存在随机性的情况下,估计计算机模拟器的响应概率分布是许多领域的一项重要任务。然而,如何在保证准确性的前提下完成这项任务仍然是一个有待解决的计算难题,尤其是对于评估成本高昂的计算机模拟器而言。首先,响应概率分布的估计在概念上被解释为贝叶斯推理问题,而不是频数推理问题。这种解释有几个重要的好处:(1)以概率方式量化和传播离散化误差;(2)纳入计算机模拟器的先验知识;(3)能够有效地将求解中的数值不确定性降低到规定水平。通过使用 GP 回归,我们以半分析的形式推导出了响应概率分布的后验统计量,并提供了数值求解方案,从而实现了概念性的贝叶斯思想。在实用贝叶斯方法的基础上,我们进一步提出了贝叶斯主动学习(BAL)方法,用于估计响应概率分布。在此背景下,贝叶斯主动学习方法的主要贡献在于利用后验统计量的优势,开发了主动学习的两个关键组件,即停止准则和学习函数,并通过五个数值示例实证证明了所提出的贝叶斯主动学习方法能够以期望的精度有效地估计响应概率分布。