可扩展经验贝叶斯推理与贝叶斯敏感性分析。

IF 3.4 1区数学 Q1 STATISTICS & PROBABILITY

Statistical Science Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI:10.1214/24-sts936

Hani Doss, Antonio Linero

{"title":"可扩展经验贝叶斯推理与贝叶斯敏感性分析。","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":null,"url":null,"abstract":"Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":"0","resultStr":"{\"title\":\"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.\",\"authors\":\"Hani Doss, Antonio Linero\",\"doi\":\"10.1214/24-sts936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.\",\"PeriodicalId\":51172,\"journal\":{\"name\":\"Statistical Science\",\"volume\":\"39 4\",\"pages\":\"601-622\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Science\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/24-sts936\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-sts936","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

考虑一个贝叶斯设置，其中我们观察到Y，其分布依赖于参数θ，即Y∣θ ~ π Y∣θ。参数θ是未知的，被视为随机的，从某个参数族中选择一个先验分布π θ(⋅；H), H∈H。对于主观贝叶斯来说，家庭中有一个单一的先验，代表他或她对θ的信念，但确定这个先验通常是非常困难的。在经验贝叶斯方法中，从数据中估计θ上的潜在分布。这通常通过选择使某些准则最大化的超参数h的值来完成。可以说，最常用的方法是设m (h)为h的边际似然，即m (h) =∫π Y∣θ v h (θ) d θ，并选择使m（⋅）最大化的h值。不幸的是，除了少数教科书上的例子外，对一个r g m a x h m (h)的解析评价是不可用的。本文的目的是双重的。首先，我们回顾了关于估计它的文献，发现最常用的程序要么可能非常不准确，要么不能很好地与h的维度、θ的维度或两者相适应。其次，我们提出了一种基于马尔可夫链蒙特卡罗的估计r g ma x h m (h)的方法，该方法非常普遍，并且随维度的变化而变化。设g为θ的实值函数，设I (h)为g （θ）的后验期望，当先验为v h时。作为我们方法的副产品，我们展示了如何获得族I (h)， h∈h的点估计和全局有效的置信带。为了说明我们的方法的范围，我们提供了三个具有不同特征的详细示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.

Consider a Bayesian setup in which we observe $Y$ , whose distribution depends on a parameter $θ$ , that is, $Y ∣ θ ~ π_{Y ∣ θ}$ . The parameter $θ$ is unknown and treated as random, and a prior distribution chosen from some parametric family $(π_{θ} (\cdot; h), h \in ℋ)$ , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about $θ$ , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on $θ$ is estimated from the data. This is usually done by choosing the value of the hyperparameter $h$ that maximizes some criterion. Arguably the most common way of doing this is to let $m (h)$ be the marginal likelihood of $h$ , that is, $m (h) = \int π_{Y ∣ θ} v_{h} (θ) d θ$ , and choose the value of $h$ that maximizes $m (\cdot)$ . Unfortunately, except for a handful of textbook examples, analytic evaluation of $a r g {m a x}_{h} m (h)$ is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of $h$ , the dimension of $θ$ , or both. Second, we present a method for estimating $a r g {m a x}_{h} m (h)$ , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let $g$ be a real-valued function of $θ$ , and let $I (h)$ be the posterior expectation of $g (θ)$ when the prior is $v_{h}$ . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family $I (h)$ , $h \in ℋ$ . To illustrate the scope of our methodology we provide three detailed examples, having different characters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Science 数学-统计学与概率论

CiteScore

6.50

自引率

1.80%

发文量

审稿时长

>12 weeks

期刊介绍： The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.