Performance of deep-learning-based approaches to improve polygenic scores

IF 15.7 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Nature Communications Pub Date : 2025-06-02 DOI:10.1038/s41467-025-60056-1

Martin Kelemen, Yu Xu, Tao Jiang, Jing Hua Zhao, Carl A. Anderson, Chris Wallace, Adam Butterworth, Michael Inouye

{"title":"Performance of deep-learning-based approaches to improve polygenic scores","authors":"Martin Kelemen, Yu Xu, Tao Jiang, Jing Hua Zhao, Carl A. Anderson, Chris Wallace, Adam Butterworth, Michael Inouye","doi":"10.1038/s41467-025-60056-1","DOIUrl":null,"url":null,"abstract":"<p>Polygenic scores, which estimate an individual’s genetic propensity for a disease or trait, have the potential to become part of genomic healthcare. Neural-network based deep-learning has emerged as a method of intense interest to model complex, nonlinear phenomena, which may be adapted to exploit gene-gene and gene-environment interactions to potentially improve polygenic scores. We fit neural-network models to both simulated and 28 real traits in the UK Biobank. To infer the amount of nonlinearity present in a phenotype, we also present a framework using neural-networks, which controls for the potential confounding effect of linkage disequilibrium. Although we found evidence for small amounts of nonlinear effects, neural-network models were outperformed by linear regression models for both genetic-only and genetic+environmental input scenarios. In this work, we find that the usefulness of neural-networks for generating polygenic scores may currently be limited and confounded by joint tagging effects due to linkage disequilibrium.</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"17 1","pages":""},"PeriodicalIF":15.7000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-025-60056-1","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Polygenic scores, which estimate an individual’s genetic propensity for a disease or trait, have the potential to become part of genomic healthcare. Neural-network based deep-learning has emerged as a method of intense interest to model complex, nonlinear phenomena, which may be adapted to exploit gene-gene and gene-environment interactions to potentially improve polygenic scores. We fit neural-network models to both simulated and 28 real traits in the UK Biobank. To infer the amount of nonlinearity present in a phenotype, we also present a framework using neural-networks, which controls for the potential confounding effect of linkage disequilibrium. Although we found evidence for small amounts of nonlinear effects, neural-network models were outperformed by linear regression models for both genetic-only and genetic+environmental input scenarios. In this work, we find that the usefulness of neural-networks for generating polygenic scores may currently be limited and confounded by joint tagging effects due to linkage disequilibrium.

Abstract Image

查看原文本刊更多论文

基于深度学习的提高多基因分数方法的性能

多基因评分，估计一个人对某种疾病或特征的遗传倾向，有可能成为基因组保健的一部分。基于神经网络的深度学习已经成为一种模拟复杂非线性现象的方法，它可以用于利用基因-基因和基因-环境的相互作用来潜在地提高多基因得分。我们将神经网络模型拟合到英国生物银行的模拟和28个真实特征中。为了推断表型中存在的非线性量，我们还提出了一个使用神经网络的框架，该框架控制了连锁不平衡的潜在混杂效应。尽管我们发现了少量非线性效应的证据，但在纯遗传和遗传+环境输入场景下，神经网络模型的表现都优于线性回归模型。在这项工作中，我们发现神经网络对生成多基因分数的有用性目前可能受到限制，并且由于链接不平衡而受到联合标记效应的混淆。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Communications Biological Science Disciplines-

CiteScore

24.90

自引率

2.40%

发文量

6928

审稿时长

3.7 months

期刊介绍： Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.