Spike and slab Bayesian sparse principal component analysis

IF 1.6 2区数学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Statistics and Computing Pub Date : 2024-05-13 DOI:10.1007/s11222-024-10430-8

Yu-Chien Bo Ning, Ning Ning

{"title":"Spike and slab Bayesian sparse principal component analysis","authors":"Yu-Chien Bo Ning, Ning Ning","doi":"10.1007/s11222-024-10430-8","DOIUrl":null,"url":null,"abstract":"Sparse principal component analysis (SPCA) is a popular tool for dimensionality reduction in high-dimensional data. However, there is still a lack of theoretically justified Bayesian SPCA methods that can scale well computationally. One of the major challenges in Bayesian SPCA is selecting an appropriate prior for the loadings matrix, considering that principal components are mutually orthogonal. We propose a novel parameter-expanded coordinate ascent variational inference (PX-CAVI) algorithm. This algorithm utilizes a spike and slab prior, which incorporates parameter expansion to cope with the orthogonality constraint. Besides comparing to two popular SPCA approaches, we introduce the PX-EM algorithm as an EM analogue to the PX-CAVI algorithm for comparison. Through extensive numerical simulations, we demonstrate that the PX-CAVI algorithm outperforms these SPCA approaches, showcasing its superiority in terms of performance. We study the posterior contraction rate of the variational posterior, providing a novel contribution to the existing literature. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. The \\(\\textsf{R}\\) package \\(\\textsf{VBsparsePCA}\\) with an implementation of the algorithm is available on the Comprehensive R Archive Network (CRAN).","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"47 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Computing","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s11222-024-10430-8","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Sparse principal component analysis (SPCA) is a popular tool for dimensionality reduction in high-dimensional data. However, there is still a lack of theoretically justified Bayesian SPCA methods that can scale well computationally. One of the major challenges in Bayesian SPCA is selecting an appropriate prior for the loadings matrix, considering that principal components are mutually orthogonal. We propose a novel parameter-expanded coordinate ascent variational inference (PX-CAVI) algorithm. This algorithm utilizes a spike and slab prior, which incorporates parameter expansion to cope with the orthogonality constraint. Besides comparing to two popular SPCA approaches, we introduce the PX-EM algorithm as an EM analogue to the PX-CAVI algorithm for comparison. Through extensive numerical simulations, we demonstrate that the PX-CAVI algorithm outperforms these SPCA approaches, showcasing its superiority in terms of performance. We study the posterior contraction rate of the variational posterior, providing a novel contribution to the existing literature. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. The \(\textsf{R}\) package \(\textsf{VBsparsePCA}\) with an implementation of the algorithm is available on the Comprehensive R Archive Network (CRAN).

Abstract Image

查看原文本刊更多论文

尖峰和板块贝叶斯稀疏主成分分析

稀疏主成分分析（SPCA）是一种常用的高维数据降维工具。然而，目前仍缺乏理论上合理、计算上可扩展的贝叶斯 SPCA 方法。考虑到主成分是相互正交的，贝叶斯 SPCA 的主要挑战之一是为载荷矩阵选择一个合适的先验值。我们提出了一种新颖的参数扩展坐标上升变异推理（PX-CAVI）算法。该算法利用尖峰和板块先验，结合参数扩展来应对正交约束。除了与两种流行的 SPCA 方法进行比较外，我们还引入了 PX-EM 算法作为 PX-CAVI 算法的 EM 类似算法进行比较。通过大量的数值模拟，我们证明了 PX-CAVI 算法的性能优于这些 SPCA 方法，展示了其在性能方面的优势。我们研究了变分后验的后验收缩率，为现有文献做出了新的贡献。然后，我们将 PX-CAVI 算法应用于研究肺癌基因表达数据集。带有该算法实现的 \(\textsf{R}\) 软件包 \(\textsf{VBsparsePCA}\) 可在综合 R 档案网络（CRAN）上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistics and Computing 数学-计算机：理论方法

CiteScore

3.20

自引率

4.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Statistics and Computing is a bi-monthly refereed journal which publishes papers covering the range of the interface between the statistical and computing sciences. In particular, it addresses the use of statistical concepts in computing science, for example in machine learning, computer vision and data analytics, as well as the use of computers in data modelling, prediction and analysis. Specific topics which are covered include: techniques for evaluating analytically intractable problems such as bootstrap resampling, Markov chain Monte Carlo, sequential Monte Carlo, approximate Bayesian computation, search and optimization methods, stochastic simulation and Monte Carlo, graphics, computer environments, statistical approaches to software errors, information retrieval, machine learning, statistics of databases and database technology, huge data sets and big data analytics, computer algebra, graphical models, image processing, tomography, inverse problems and uncertainty quantification. In addition, the journal contains original research reports, authoritative review papers, discussed papers, and occasional special issues on particular topics or carrying proceedings of relevant conferences. Statistics and Computing also publishes book review and software review sections.