{"title":"可扩展贝叶斯高维局部依赖学习","authors":"Kyoungjae Lee, Lizhen Lin","doi":"10.1214/21-ba1299","DOIUrl":null,"url":null,"abstract":". In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. Bayesian procedure for high-dimensional local dependence learning, where variables close to each other are more likely to be correlated. The proposed prior, LANCE prior, allows an exact computation of posteriors, which enables scalable inference even in high-dimensional settings. Furthermore, it provides a scalable Bayesian cross-validation to choose the hyperparameters. We establish selection consistency for the local dependence structure and posterior convergence rates for the Cholesky factor. The required conditions for these theoretical results are significantly weakened compared with the existing literature. Simulation studies in various settings show that LANCE prior outperforms other contenders in terms of the ROC curve, cross-validation-based analysis and computation time. Two real data analyses based on the phone call center and gun point data illustrate the satisfactory performance of the proposed method in linear prediction and classification problems, respectively.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":null,"pages":null},"PeriodicalIF":4.9000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable Bayesian High-dimensional Local Dependence Learning\",\"authors\":\"Kyoungjae Lee, Lizhen Lin\",\"doi\":\"10.1214/21-ba1299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. Bayesian procedure for high-dimensional local dependence learning, where variables close to each other are more likely to be correlated. The proposed prior, LANCE prior, allows an exact computation of posteriors, which enables scalable inference even in high-dimensional settings. Furthermore, it provides a scalable Bayesian cross-validation to choose the hyperparameters. We establish selection consistency for the local dependence structure and posterior convergence rates for the Cholesky factor. The required conditions for these theoretical results are significantly weakened compared with the existing literature. Simulation studies in various settings show that LANCE prior outperforms other contenders in terms of the ROC curve, cross-validation-based analysis and computation time. Two real data analyses based on the phone call center and gun point data illustrate the satisfactory performance of the proposed method in linear prediction and classification problems, respectively.\",\"PeriodicalId\":55398,\"journal\":{\"name\":\"Bayesian Analysis\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2021-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bayesian Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/21-ba1299\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bayesian Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-ba1299","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Scalable Bayesian High-dimensional Local Dependence Learning
. In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets. Bayesian procedure for high-dimensional local dependence learning, where variables close to each other are more likely to be correlated. The proposed prior, LANCE prior, allows an exact computation of posteriors, which enables scalable inference even in high-dimensional settings. Furthermore, it provides a scalable Bayesian cross-validation to choose the hyperparameters. We establish selection consistency for the local dependence structure and posterior convergence rates for the Cholesky factor. The required conditions for these theoretical results are significantly weakened compared with the existing literature. Simulation studies in various settings show that LANCE prior outperforms other contenders in terms of the ROC curve, cross-validation-based analysis and computation time. Two real data analyses based on the phone call center and gun point data illustrate the satisfactory performance of the proposed method in linear prediction and classification problems, respectively.
期刊介绍:
Bayesian Analysis is an electronic journal of the International Society for Bayesian Analysis. It seeks to publish a wide range of articles that demonstrate or discuss Bayesian methods in some theoretical or applied context. The journal welcomes submissions involving presentation of new computational and statistical methods; critical reviews and discussions of existing approaches; historical perspectives; description of important scientific or policy application areas; case studies; and methods for experimental design, data collection, data sharing, or data mining.
Evaluation of submissions is based on importance of content and effectiveness of communication. Discussion papers are typically chosen by the Editor in Chief, or suggested by an Editor, among the regular submissions. In addition, the Journal encourages individual authors to submit manuscripts for consideration as discussion papers.