可扩展经验贝叶斯推理与贝叶斯敏感性分析。

IF 3.9 1区 数学 Q1 STATISTICS & PROBABILITY
Statistical Science Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI:10.1214/24-sts936
Hani Doss, Antonio Linero
{"title":"可扩展经验贝叶斯推理与贝叶斯敏感性分析。","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":null,"url":null,"abstract":"<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":"0","resultStr":"{\"title\":\"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.\",\"authors\":\"Hani Doss, Antonio Linero\",\"doi\":\"10.1214/24-sts936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.</p>\",\"PeriodicalId\":51172,\"journal\":{\"name\":\"Statistical Science\",\"volume\":\"39 4\",\"pages\":\"601-622\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Science\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/24-sts936\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-sts936","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

考虑一个贝叶斯设置,其中我们观察到Y,其分布依赖于参数θ,即Y∣θ ~ π Y∣θ。参数θ是未知的,被视为随机的,从某个参数族中选择一个先验分布π θ(⋅;H), H∈H。对于主观贝叶斯来说,家庭中有一个单一的先验,代表他或她对θ的信念,但确定这个先验通常是非常困难的。在经验贝叶斯方法中,从数据中估计θ上的潜在分布。这通常通过选择使某些准则最大化的超参数h的值来完成。可以说,最常用的方法是设m (h)为h的边际似然,即m (h) =∫π Y∣θ v h (θ) d θ,并选择使m(⋅)最大化的h值。不幸的是,除了少数教科书上的例子外,对一个r g m a x h m (h)的解析评价是不可用的。本文的目的是双重的。首先,我们回顾了关于估计它的文献,发现最常用的程序要么可能非常不准确,要么不能很好地与h的维度、θ的维度或两者相适应。其次,我们提出了一种基于马尔可夫链蒙特卡罗的估计r g ma x h m (h)的方法,该方法非常普遍,并且随维度的变化而变化。设g为θ的实值函数,设I (h)为g (θ)的后验期望,当先验为v h时。作为我们方法的副产品,我们展示了如何获得族I (h), h∈h的点估计和全局有效的置信带。为了说明我们的方法的范围,我们提供了三个具有不同特征的详细示例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.

Consider a Bayesian setup in which we observe Y , whose distribution depends on a parameter θ , that is, Y θ ~ π Y θ . The parameter θ is unknown and treated as random, and a prior distribution chosen from some parametric family π θ ( ; h ) , h , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about θ , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on θ is estimated from the data. This is usually done by choosing the value of the hyperparameter h that maximizes some criterion. Arguably the most common way of doing this is to let m ( h ) be the marginal likelihood of h , that is, m ( h ) = π Y θ v h ( θ ) d θ , and choose the value of h that maximizes m ( ) . Unfortunately, except for a handful of textbook examples, analytic evaluation of a r g m a x h m ( h ) is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of h , the dimension of θ , or both. Second, we present a method for estimating a r g m a x h m ( h ) , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let g be a real-valued function of θ , and let I ( h ) be the posterior expectation of g ( θ ) when the prior is v h . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family I ( h ) , h . To illustrate the scope of our methodology we provide three detailed examples, having different characters.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistical Science
Statistical Science 数学-统计学与概率论
CiteScore
6.50
自引率
1.80%
发文量
40
审稿时长
>12 weeks
期刊介绍: The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信