Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.

IF 3.9 1区 数学 Q1 STATISTICS & PROBABILITY
Statistical Science Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI:10.1214/24-sts936
Hani Doss, Antonio Linero
{"title":"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":null,"url":null,"abstract":"<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-sts936","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Consider a Bayesian setup in which we observe Y , whose distribution depends on a parameter θ , that is, Y θ ~ π Y θ . The parameter θ is unknown and treated as random, and a prior distribution chosen from some parametric family π θ ( ; h ) , h , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about θ , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on θ is estimated from the data. This is usually done by choosing the value of the hyperparameter h that maximizes some criterion. Arguably the most common way of doing this is to let m ( h ) be the marginal likelihood of h , that is, m ( h ) = π Y θ v h ( θ ) d θ , and choose the value of h that maximizes m ( ) . Unfortunately, except for a handful of textbook examples, analytic evaluation of a r g m a x h m ( h ) is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of h , the dimension of θ , or both. Second, we present a method for estimating a r g m a x h m ( h ) , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let g be a real-valued function of θ , and let I ( h ) be the posterior expectation of g ( θ ) when the prior is v h . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family I ( h ) , h . To illustrate the scope of our methodology we provide three detailed examples, having different characters.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Statistical Science
Statistical Science 数学-统计学与概率论
CiteScore
6.50
自引率
1.80%
发文量
40
审稿时长
>12 weeks
期刊介绍: The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信