神经网络和神经算子中贝叶斯推理的加速哈密顿蒙特卡罗

IF 7.3 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Computer Methods in Applied Mechanics and Engineering Pub Date : 2025-09-21 DOI:10.1016/j.cma.2025.118401

Ponkrshnan Thiagarajan , Tamer A. Zaki , Michael D. Shields

{"title":"神经网络和神经算子中贝叶斯推理的加速哈密顿蒙特卡罗","authors":"Ponkrshnan Thiagarajan , Tamer A. Zaki , Michael D. Shields","doi":"10.1016/j.cma.2025.118401","DOIUrl":null,"url":null,"abstract":"<div><div>Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network’s parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.</div></div>","PeriodicalId":55222,"journal":{"name":"Computer Methods in Applied Mechanics and Engineering","volume":"447 ","pages":"Article 118401"},"PeriodicalIF":7.3000,"publicationDate":"2025-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating Hamiltonian Monte Carlo for Bayesian inference in neural networks and neural operators\",\"authors\":\"Ponkrshnan Thiagarajan , Tamer A. Zaki , Michael D. Shields\",\"doi\":\"10.1016/j.cma.2025.118401\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network’s parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.</div></div>\",\"PeriodicalId\":55222,\"journal\":{\"name\":\"Computer Methods in Applied Mechanics and Engineering\",\"volume\":\"447 \",\"pages\":\"Article 118401\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Methods in Applied Mechanics and Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045782525006735\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Methods in Applied Mechanics and Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045782525006735","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

哈密顿蒙特卡罗（HMC）是贝叶斯推理中从后验分布中抽样的一种强大而准确的方法。然而，由于网络参数空间的高维性和后验分布的非凸性，HMC技术对贝叶斯神经网络的计算量要求很高。因此，各种近似技术，如变分推理（VI）或随机梯度MCMC，经常被用来推断网络参数的后验分布。这种近似在推断的分布中引入了不准确性，导致不确定性估计不可靠。在这项工作中，我们提出了一种混合方法，结合了廉价的VI和精确的HMC方法，以有效和准确地量化神经网络和神经算子的不确定性。所提出的方法利用在整个网络上的初始VI训练。我们研究了单个参数对预测不确定性的影响，结果表明，很大一部分参数对网络预测的不确定性没有实质性的贡献。然后使用这些信息来显着降低参数空间的维数，并且仅对强烈影响预测不确定性的网络参数子集执行HMC。这产生了一个加速全批HMC用于神经网络后验推理的框架。我们在深度神经网络和算子网络上展示了所提出框架的效率和准确性，表明可以对具有数万到数十万个参数的大型网络进行推理。我们通过对高超声速流动中从上游条件映射到锥体壁面压力数据的算子进行建模，表明该方法可以有效地学习复杂物理系统的代理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating Hamiltonian Monte Carlo for Bayesian inference in neural networks and neural operators

Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high dimensionality of the network’s parameter space and the non-convexity of their posterior distributions. Therefore, various approximation techniques, such as variational inference (VI) or stochastic gradient MCMC, are often employed to infer the posterior distribution of the network parameters. Such approximations introduce inaccuracies in the inferred distributions, resulting in unreliable uncertainty estimates. In this work, we propose a hybrid approach that combines inexpensive VI and accurate HMC methods to efficiently and accurately quantify uncertainties in neural networks and neural operators. The proposed approach leverages an initial VI training on the full network. We examine the influence of individual parameters on the prediction uncertainty, which shows that a large proportion of the parameters do not contribute substantially to uncertainty in the network predictions. This information is then used to significantly reduce the dimension of the parameter space, and HMC is performed only for the subset of network parameters that strongly influence prediction uncertainties. This yields a framework for accelerating the full batch HMC for posterior inference in neural networks. We demonstrate the efficiency and accuracy of the proposed framework on deep neural networks and operator networks, showing that inference can be performed for large networks with tens to hundreds of thousands of parameters. We show that this method can effectively learn surrogates for complex physical systems by modeling the operator that maps from upstream conditions to wall-pressure data on a cone in hypersonic flow.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Methods in Applied Mechanics and Engineering 工程技术-工程：综合

CiteScore

12.70

自引率

15.30%

发文量

719

审稿时长

44 days

期刊介绍： Computer Methods in Applied Mechanics and Engineering stands as a cornerstone in the realm of computational science and engineering. With a history spanning over five decades, the journal has been a key platform for disseminating papers on advanced mathematical modeling and numerical solutions. Interdisciplinary in nature, these contributions encompass mechanics, mathematics, computer science, and various scientific disciplines. The journal welcomes a broad range of computational methods addressing the simulation, analysis, and design of complex physical problems, making it a vital resource for researchers in the field.