可扩展贝叶斯高维局部依赖学习

IF 4.9 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis Pub Date : 2021-09-24 DOI:10.1214/21-ba1299

Kyoungjae Lee, Lizhen Lin

{"title":"可扩展贝叶斯高维局部依赖学习","authors":"Kyoungjae Lee, Lizhen Lin","doi":"10.1214/21-ba1299","DOIUrl":null,"url":null,"abstract":". In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of ﬁelds such as ﬁnance, genome associations analysis and spatial modeling. We adopt a ﬂexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a ﬁxed bandwidth cannot be adapted for this general setup. We employ the modiﬁed Cholesky decomposition for the precision matrix and design a ﬂexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to eﬃcient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. Bayesian procedure for high-dimensional local dependence learning, where variables close to each other are more likely to be correlated. The proposed prior, LANCE prior, allows an exact computation of posteriors, which enables scalable inference even in high-dimensional settings. Furthermore, it provides a scalable Bayesian cross-validation to choose the hyperparameters. We establish selection consistency for the local dependence structure and posterior convergence rates for the Cholesky factor. The required conditions for these theoretical results are signiﬁcantly weakened compared with the existing literature. Simulation studies in various settings show that LANCE prior outperforms other contenders in terms of the ROC curve, cross-validation-based analysis and computation time. Two real data analyses based on the phone call center and gun point data illustrate the satisfactory performance of the proposed method in linear prediction and classiﬁcation problems, respectively.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":"1 1","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable Bayesian High-dimensional Local Dependence Learning\",\"authors\":\"Kyoungjae Lee, Lizhen Lin\",\"doi\":\"10.1214/21-ba1299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of ﬁelds such as ﬁnance, genome associations analysis and spatial modeling. We adopt a ﬂexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a ﬁxed bandwidth cannot be adapted for this general setup. We employ the modiﬁed Cholesky decomposition for the precision matrix and design a ﬂexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to eﬃcient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. Bayesian procedure for high-dimensional local dependence learning, where variables close to each other are more likely to be correlated. The proposed prior, LANCE prior, allows an exact computation of posteriors, which enables scalable inference even in high-dimensional settings. Furthermore, it provides a scalable Bayesian cross-validation to choose the hyperparameters. We establish selection consistency for the local dependence structure and posterior convergence rates for the Cholesky factor. The required conditions for these theoretical results are signiﬁcantly weakened compared with the existing literature. Simulation studies in various settings show that LANCE prior outperforms other contenders in terms of the ROC curve, cross-validation-based analysis and computation time. Two real data analyses based on the phone call center and gun point data illustrate the satisfactory performance of the proposed method in linear prediction and classiﬁcation problems, respectively.\",\"PeriodicalId\":55398,\"journal\":{\"name\":\"Bayesian Analysis\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2021-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bayesian Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/21-ba1299\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bayesian Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-ba1299","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 1

摘要

在这项工作中，我们提出了一种可扩展的贝叶斯过程，用于学习高维模型中的局部依赖结构，其中变量具有自然排序。变量的排序可以通过时间、空间位置的邻域等进行索引，自然假设相距遥远的变量往往具有弱相关性。这些模型在金融、基因组关联分析和空间建模等领域有着广泛的应用。我们采用了一个灵活的框架，在该框架下，每个变量都依赖于其邻居或前任，并且每个变量的邻居大小可以不同。通过估计协方差或精度矩阵来揭示这种局部依赖结构，同时对每个变量的变化邻域大小产生一致的估计，这是非常令人感兴趣的。现有的关于带状协方差矩阵估计的文献，假设固定的带宽，不能适用于这种通用设置。我们对精度矩阵采用了改进的Cholesky分解，并通过对邻域大小和Cholesky因子的适当先验为该模型设计了一个灵活的先验。导出了Cholesky因子的后验收缩率，其几乎或完全是极小极大最优的，并且我们的程序导致了对所有变量的邻域大小的一致估计。我们程序的另一个吸引人的特点是，由于不依赖MCMC算法进行有效的后验推理，它可以扩展到具有大量变量的模型。用竞争方法进行了数值比较，并考虑了一些真实数据集的应用。用于高维局部依赖学习的贝叶斯过程，其中彼此接近的变量更有可能相互关联。所提出的先验LANCE先验允许精确计算后验，即使在高维环境中也能进行可扩展推理。此外，它还提供了一种可扩展的贝叶斯交叉验证来选择超参数。我们建立了局部依赖结构的选择一致性和Cholesky因子的后验收敛率。与现有文献相比，这些理论结果所需的条件明显减弱。在各种环境下的仿真研究表明，LANCE先验在ROC曲线、基于交叉验证的分析和计算时间方面优于其他竞争者。基于电话呼叫中心和枪口数据的两个真实数据分析分别说明了所提出的方法在线性预测和分类问题中的令人满意的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Bayesian High-dimensional Local Dependence Learning

. In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of ﬁelds such as ﬁnance, genome associations analysis and spatial modeling. We adopt a ﬂexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a ﬁxed bandwidth cannot be adapted for this general setup. We employ the modiﬁed Cholesky decomposition for the precision matrix and design a ﬂexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to eﬃcient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. Bayesian procedure for high-dimensional local dependence learning, where variables close to each other are more likely to be correlated. The proposed prior, LANCE prior, allows an exact computation of posteriors, which enables scalable inference even in high-dimensional settings. Furthermore, it provides a scalable Bayesian cross-validation to choose the hyperparameters. We establish selection consistency for the local dependence structure and posterior convergence rates for the Cholesky factor. The required conditions for these theoretical results are signiﬁcantly weakened compared with the existing literature. Simulation studies in various settings show that LANCE prior outperforms other contenders in terms of the ROC curve, cross-validation-based analysis and computation time. Two real data analyses based on the phone call center and gun point data illustrate the satisfactory performance of the proposed method in linear prediction and classiﬁcation problems, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bayesian Analysis 数学-数学跨学科应用

CiteScore

6.50

自引率

13.60%

发文量

审稿时长

>12 weeks

期刊介绍： Bayesian Analysis is an electronic journal of the International Society for Bayesian Analysis. It seeks to publish a wide range of articles that demonstrate or discuss Bayesian methods in some theoretical or applied context. The journal welcomes submissions involving presentation of new computational and statistical methods; critical reviews and discussions of existing approaches; historical perspectives; description of important scientific or policy application areas; case studies; and methods for experimental design, data collection, data sharing, or data mining. Evaluation of submissions is based on importance of content and effectiveness of communication. Discussion papers are typically chosen by the Editor in Chief, or suggested by an Editor, among the regular submissions. In addition, the Journal encourages individual authors to submit manuscripts for consideration as discussion papers.