Penalized Principal Component Analysis Using Smoothing.

ArXiv Pub Date : 2025-03-03

Rebecca M Hurwitz, Georg Hahn

{"title":"Penalized Principal Component Analysis Using Smoothing.","authors":"Rebecca M Hurwitz, Georg Hahn","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Principal components computed via PCA (principal component analysis) are traditionally used to reduce dimensionality in genomic data or to correct for population stratification. In this paper, we explore the penalized eigenvalue problem (PEP) which reformulates the computation of the first eigenvector as an optimization problem and adds an $L_1$ penalty constraint to enforce sparseness of the solution. The contribution of our article is threefold. First, we extend PEP by applying smoothing to the original LASSO-type $L_1$ penalty. This allows one to compute analytical gradients which enable faster and more efficient minimization of the objective function associated with the optimization problem. Second, we demonstrate how higher order eigenvectors can be calculated with PEP using established results from singular value decomposition (SVD). Third, we present four experimental studies to demonstrate the usefulness of the smoothed penalized eigenvectors. Using data from the 1000 Genomes Project dataset, we empirically demonstrate that our proposed smoothed PEP allows one to increase numerical stability and obtain meaningful eigenvectors. We also employ the penalized eigenvector approach in two additional real data applications (computation of a polygenic risk score and clustering), demonstrating that exchanging the penalized eigenvectors for their smoothed counterparts can increase prediction accuracy in polygenic risk scores and enhance discernibility of clusterings. Moreover, we compare our proposed smoothed PEP to seven state-of-the-art algorithms for sparse PCA and evaluate the accuracy of the obtained eigenvectors, their support recovery, and their runtime.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557800/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Principal components computed via PCA (principal component analysis) are traditionally used to reduce dimensionality in genomic data or to correct for population stratification. In this paper, we explore the penalized eigenvalue problem (PEP) which reformulates the computation of the first eigenvector as an optimization problem and adds an $L_1$ penalty constraint to enforce sparseness of the solution. The contribution of our article is threefold. First, we extend PEP by applying smoothing to the original LASSO-type $L_1$ penalty. This allows one to compute analytical gradients which enable faster and more efficient minimization of the objective function associated with the optimization problem. Second, we demonstrate how higher order eigenvectors can be calculated with PEP using established results from singular value decomposition (SVD). Third, we present four experimental studies to demonstrate the usefulness of the smoothed penalized eigenvectors. Using data from the 1000 Genomes Project dataset, we empirically demonstrate that our proposed smoothed PEP allows one to increase numerical stability and obtain meaningful eigenvectors. We also employ the penalized eigenvector approach in two additional real data applications (computation of a polygenic risk score and clustering), demonstrating that exchanging the penalized eigenvectors for their smoothed counterparts can increase prediction accuracy in polygenic risk scores and enhance discernibility of clusterings. Moreover, we compare our proposed smoothed PEP to seven state-of-the-art algorithms for sparse PCA and evaluate the accuracy of the obtained eigenvectors, their support recovery, and their runtime.

本刊更多论文

使用Nesterov平滑的惩罚主成分分析。

通过PCA（主成分分析）计算的主成分传统上用于降低基因组数据的维度或校正群体分层。在本文中，我们探讨了惩罚特征值问题（PEP），该问题将第一特征向量的计算重新表述为优化问题，并添加了L1惩罚约束。我们的文章有三方面的贡献。首先，我们通过将Nesterov平滑应用于原始LASSO类型的L1惩罚来扩展PEP。这允许计算分析梯度，其使得能够更快且更有效地最小化与优化问题相关联的目标函数。其次，我们演示了如何使用奇异值分解（SVD）的既定结果，用PEP计算高阶特征向量。第三，使用来自1000基因组计划数据集的数据，我们实证证明，我们提出的平滑PEP可以提高数值稳定性并获得有意义的特征向量。我们进一步研究了惩罚特征向量方法相对于传统PCA的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量