多基因风险评分的深度学习研究综述。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-07-02 DOI:10.1093/bib/bbaf373

Max Schuran, Benjamin Goudey, Gillian S Dite, Enes Makalic

{"title":"多基因风险评分的深度学习研究综述。","authors":"Max Schuran, Benjamin Goudey, Gillian S Dite, Enes Makalic","doi":"10.1093/bib/bbaf373","DOIUrl":null,"url":null,"abstract":"Polygenic risk scores (PRS) combine the effects of multiple genetic variants to predict an individual's genetic predisposition to a disease. PRS typically rely on linear models, which assume that all genetic variants act independently. They often fall short in predictive accuracy and are not able to explain the genetic variability of a trait to the full extent. There is growing interest in applying deep learning neural networks to model PRS given their ability to model non-linear relationships and strong performance in other domains. We conducted a survey of the literature to investigate how neural networks model PRS. We categorize deep learning-based approaches by their underlying architecture, highlighting their modeling assumptions, likely strengths and potential weaknesses of the architectures. Several categories of neural network architectures exhibited promising signs for the improvement of PRS' predictive power, namely sequence-based architectures, graph neural networks and those that incorporated biological knowledge. Additionally, the use of latent representations in autoencoders has improved predictive performance across diverse ancestries. However, a lack of existing model benchmarks on consistent datasets and phenotypes makes it challenging to understand the extent to which different architectures improve performance. Interpretability of deep learning-based PRS is also challenging with great care required when inferring causation. To address these challenges, we suggest the establishment and adherence to reporting standards and benchmarks to aid the development of deep learning-based PRS to find quantifiable trends in neural network architectures.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12454937/pdf/","citationCount":"0","resultStr":"{\"title\":\"A survey on deep learning for polygenic risk scores.\",\"authors\":\"Max Schuran, Benjamin Goudey, Gillian S Dite, Enes Makalic\",\"doi\":\"10.1093/bib/bbaf373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Polygenic risk scores (PRS) combine the effects of multiple genetic variants to predict an individual's genetic predisposition to a disease. PRS typically rely on linear models, which assume that all genetic variants act independently. They often fall short in predictive accuracy and are not able to explain the genetic variability of a trait to the full extent. There is growing interest in applying deep learning neural networks to model PRS given their ability to model non-linear relationships and strong performance in other domains. We conducted a survey of the literature to investigate how neural networks model PRS. We categorize deep learning-based approaches by their underlying architecture, highlighting their modeling assumptions, likely strengths and potential weaknesses of the architectures. Several categories of neural network architectures exhibited promising signs for the improvement of PRS' predictive power, namely sequence-based architectures, graph neural networks and those that incorporated biological knowledge. Additionally, the use of latent representations in autoencoders has improved predictive performance across diverse ancestries. However, a lack of existing model benchmarks on consistent datasets and phenotypes makes it challenging to understand the extent to which different architectures improve performance. Interpretability of deep learning-based PRS is also challenging with great care required when inferring causation. To address these challenges, we suggest the establishment and adherence to reporting standards and benchmarks to aid the development of deep learning-based PRS to find quantifiable trends in neural network architectures.\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 4\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12454937/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf373\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf373","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

多基因风险评分（PRS）结合多种遗传变异的影响来预测个体对某种疾病的遗传易感性。PRS通常依赖于线性模型，它假设所有的遗传变异都是独立的。它们往往缺乏预测的准确性，也不能充分解释一个性状的遗传变异。由于深度学习神经网络能够模拟非线性关系，并且在其他领域表现出色，因此人们对将深度学习神经网络应用于PRS建模的兴趣越来越大。我们对文献进行了调查，以研究神经网络如何建模PRS。我们根据它们的底层架构对基于深度学习的方法进行了分类，强调了它们的建模假设、架构的可能优势和潜在弱点。有几类神经网络架构表现出了提高PRS预测能力的良好迹象，即基于序列的架构、图神经网络和包含生物知识的神经网络。此外，在自动编码器中使用潜在表示提高了跨不同祖先的预测性能。然而，由于缺乏针对一致数据集和表型的现有模型基准，因此很难理解不同架构在多大程度上提高了性能。基于深度学习的PRS的可解释性也具有挑战性，在推断因果关系时需要非常小心。为了应对这些挑战，我们建议建立并遵守报告标准和基准，以帮助开发基于深度学习的PRS，以发现神经网络架构中可量化的趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A survey on deep learning for polygenic risk scores.

Polygenic risk scores (PRS) combine the effects of multiple genetic variants to predict an individual's genetic predisposition to a disease. PRS typically rely on linear models, which assume that all genetic variants act independently. They often fall short in predictive accuracy and are not able to explain the genetic variability of a trait to the full extent. There is growing interest in applying deep learning neural networks to model PRS given their ability to model non-linear relationships and strong performance in other domains. We conducted a survey of the literature to investigate how neural networks model PRS. We categorize deep learning-based approaches by their underlying architecture, highlighting their modeling assumptions, likely strengths and potential weaknesses of the architectures. Several categories of neural network architectures exhibited promising signs for the improvement of PRS' predictive power, namely sequence-based architectures, graph neural networks and those that incorporated biological knowledge. Additionally, the use of latent representations in autoencoders has improved predictive performance across diverse ancestries. However, a lack of existing model benchmarks on consistent datasets and phenotypes makes it challenging to understand the extent to which different architectures improve performance. Interpretability of deep learning-based PRS is also challenging with great care required when inferring causation. To address these challenges, we suggest the establishment and adherence to reporting standards and benchmarks to aid the development of deep learning-based PRS to find quantifiable trends in neural network architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.