稀疏矩阵分解对跨GWASs的样本共享具有鲁棒性,揭示了可解释的遗传成分。

IF 8.1 1区 生物学 Q1 GENETICS & HEREDITY
American journal of human genetics Pub Date : 2025-09-04 Epub Date: 2025-07-28 DOI:10.1016/j.ajhg.2025.07.003
Ashton R Omdahl, Joshua S Weinstock, Rebecca Keener, Surya B Chhetri, Marios Arvanitis, Alexis Battle
{"title":"稀疏矩阵分解对跨GWASs的样本共享具有鲁棒性,揭示了可解释的遗传成分。","authors":"Ashton R Omdahl, Joshua S Weinstock, Rebecca Keener, Surya B Chhetri, Marios Arvanitis, Alexis Battle","doi":"10.1016/j.ajhg.2025.07.003","DOIUrl":null,"url":null,"abstract":"<p><p>Complex trait-associated genetic variation is highly pleiotropic. This extensive pleiotropy implies that multi-phenotype analyses are informative for characterizing genetic associations, as they facilitate the discovery of trait-shared and trait-specific variants and pathways (\"genetic factors\"). Previous efforts have estimated genetic factors using matrix factorization (MF) applied to numerous genome-wide association studies (GWASs). However, existing methods are susceptible to spurious factors arising from residual confounding due to sample sharing in biobank GWASs. Furthermore, MF approaches have historically estimated dense factors, loaded on most traits and variants, that are challenging to map onto interpretable biological pathways. To address these shortcomings, we introduce \"GWAS latent embeddings accounting for noise and regularization\" (GLEANR), an MF method for detection of sparse genetic factors from summary statistics. GLEANR accounts for sample sharing between studies and uses regularization to estimate a data-driven number of interpretable factors. GLEANR is robust to confounding induced by shared samples and improves the replication of genetic factors derived from distinct biobanks. We used GLEANR to evaluate 137 diverse GWASs from the UK Biobank, identifying 58 factors that decompose the genetic architecture of input traits and have distinct signatures of negative selection and degrees of polygenicity. These sparse factors can be interpreted with respect to disease, cell type, and pathway enrichment. We highlight three such factors that captured platelet-measure phenotypes and were enriched for disease-relevant markers corresponding to distinct stages of platelet differentiation. Overall, GLEANR is a powerful tool for discovering both trait-specific and trait-shared pathways underlying complex traits from GWAS summary statistics.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"2178-2197"},"PeriodicalIF":8.1000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461026/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sparse matrix factorization robust to sample sharing across GWASs reveals interpretable genetic components.\",\"authors\":\"Ashton R Omdahl, Joshua S Weinstock, Rebecca Keener, Surya B Chhetri, Marios Arvanitis, Alexis Battle\",\"doi\":\"10.1016/j.ajhg.2025.07.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Complex trait-associated genetic variation is highly pleiotropic. This extensive pleiotropy implies that multi-phenotype analyses are informative for characterizing genetic associations, as they facilitate the discovery of trait-shared and trait-specific variants and pathways (\\\"genetic factors\\\"). Previous efforts have estimated genetic factors using matrix factorization (MF) applied to numerous genome-wide association studies (GWASs). However, existing methods are susceptible to spurious factors arising from residual confounding due to sample sharing in biobank GWASs. Furthermore, MF approaches have historically estimated dense factors, loaded on most traits and variants, that are challenging to map onto interpretable biological pathways. To address these shortcomings, we introduce \\\"GWAS latent embeddings accounting for noise and regularization\\\" (GLEANR), an MF method for detection of sparse genetic factors from summary statistics. GLEANR accounts for sample sharing between studies and uses regularization to estimate a data-driven number of interpretable factors. GLEANR is robust to confounding induced by shared samples and improves the replication of genetic factors derived from distinct biobanks. We used GLEANR to evaluate 137 diverse GWASs from the UK Biobank, identifying 58 factors that decompose the genetic architecture of input traits and have distinct signatures of negative selection and degrees of polygenicity. These sparse factors can be interpreted with respect to disease, cell type, and pathway enrichment. We highlight three such factors that captured platelet-measure phenotypes and were enriched for disease-relevant markers corresponding to distinct stages of platelet differentiation. Overall, GLEANR is a powerful tool for discovering both trait-specific and trait-shared pathways underlying complex traits from GWAS summary statistics.</p>\",\"PeriodicalId\":7659,\"journal\":{\"name\":\"American journal of human genetics\",\"volume\":\" \",\"pages\":\"2178-2197\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461026/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of human genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ajhg.2025.07.003\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2025.07.003","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

复杂性状相关的遗传变异是高度多效性的。这种广泛的多效性意味着,多表型分析有助于描述遗传关联,因为它们有助于发现性状共享和性状特异性变异和途径(“遗传因素”)。以前的研究使用矩阵分解(MF)估计遗传因素,并应用于许多全基因组关联研究(GWASs)。然而,现有的方法容易受到由于样本共享而产生的残留混淆的虚假因素的影响。此外,MF方法在历史上估计了装载在大多数性状和变异上的密集因子,这些因子很难映射到可解释的生物学途径上。为了解决这些缺点,我们引入了“考虑噪声和正则化的GWAS潜在嵌入”(leanr),这是一种从汇总统计中检测稀疏遗传因素的MF方法。leanr考虑了研究之间的样本共享,并使用正则化来估计数据驱动的可解释因素的数量。GLEANR对共享样本引起的混淆具有鲁棒性,并提高了来自不同生物库的遗传因子的复制。我们使用leanr对来自UK Biobank的137个不同的GWASs进行了评估,确定了58个因子分解了输入性状的遗传结构,并具有明显的负选择和多基因性程度。这些稀疏因子可以用疾病、细胞类型和通路富集来解释。我们强调了三个这样的因子,它们捕获了血小板测量表型,并富集了与血小板分化不同阶段对应的疾病相关标记。总的来说,leanr是一个强大的工具,可以从GWAS汇总统计中发现复杂性状背后的性状特异性和性状共享通路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sparse matrix factorization robust to sample sharing across GWASs reveals interpretable genetic components.

Complex trait-associated genetic variation is highly pleiotropic. This extensive pleiotropy implies that multi-phenotype analyses are informative for characterizing genetic associations, as they facilitate the discovery of trait-shared and trait-specific variants and pathways ("genetic factors"). Previous efforts have estimated genetic factors using matrix factorization (MF) applied to numerous genome-wide association studies (GWASs). However, existing methods are susceptible to spurious factors arising from residual confounding due to sample sharing in biobank GWASs. Furthermore, MF approaches have historically estimated dense factors, loaded on most traits and variants, that are challenging to map onto interpretable biological pathways. To address these shortcomings, we introduce "GWAS latent embeddings accounting for noise and regularization" (GLEANR), an MF method for detection of sparse genetic factors from summary statistics. GLEANR accounts for sample sharing between studies and uses regularization to estimate a data-driven number of interpretable factors. GLEANR is robust to confounding induced by shared samples and improves the replication of genetic factors derived from distinct biobanks. We used GLEANR to evaluate 137 diverse GWASs from the UK Biobank, identifying 58 factors that decompose the genetic architecture of input traits and have distinct signatures of negative selection and degrees of polygenicity. These sparse factors can be interpreted with respect to disease, cell type, and pathway enrichment. We highlight three such factors that captured platelet-measure phenotypes and were enriched for disease-relevant markers corresponding to distinct stages of platelet differentiation. Overall, GLEANR is a powerful tool for discovering both trait-specific and trait-shared pathways underlying complex traits from GWAS summary statistics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
14.70
自引率
4.10%
发文量
185
审稿时长
1 months
期刊介绍: The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信