Max Schuran, Benjamin Goudey, Gillian S Dite, Enes Makalic
{"title":"多基因风险评分的深度学习研究综述。","authors":"Max Schuran, Benjamin Goudey, Gillian S Dite, Enes Makalic","doi":"10.1093/bib/bbaf373","DOIUrl":null,"url":null,"abstract":"<p><p>Polygenic risk scores (PRS) combine the effects of multiple genetic variants to predict an individual's genetic predisposition to a disease. PRS typically rely on linear models, which assume that all genetic variants act independently. They often fall short in predictive accuracy and are not able to explain the genetic variability of a trait to the full extent. There is growing interest in applying deep learning neural networks to model PRS given their ability to model non-linear relationships and strong performance in other domains. We conducted a survey of the literature to investigate how neural networks model PRS. We categorize deep learning-based approaches by their underlying architecture, highlighting their modeling assumptions, likely strengths and potential weaknesses of the architectures. Several categories of neural network architectures exhibited promising signs for the improvement of PRS' predictive power, namely sequence-based architectures, graph neural networks and those that incorporated biological knowledge. Additionally, the use of latent representations in autoencoders has improved predictive performance across diverse ancestries. However, a lack of existing model benchmarks on consistent datasets and phenotypes makes it challenging to understand the extent to which different architectures improve performance. Interpretability of deep learning-based PRS is also challenging with great care required when inferring causation. To address these challenges, we suggest the establishment and adherence to reporting standards and benchmarks to aid the development of deep learning-based PRS to find quantifiable trends in neural network architectures.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12454937/pdf/","citationCount":"0","resultStr":"{\"title\":\"A survey on deep learning for polygenic risk scores.\",\"authors\":\"Max Schuran, Benjamin Goudey, Gillian S Dite, Enes Makalic\",\"doi\":\"10.1093/bib/bbaf373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Polygenic risk scores (PRS) combine the effects of multiple genetic variants to predict an individual's genetic predisposition to a disease. PRS typically rely on linear models, which assume that all genetic variants act independently. They often fall short in predictive accuracy and are not able to explain the genetic variability of a trait to the full extent. There is growing interest in applying deep learning neural networks to model PRS given their ability to model non-linear relationships and strong performance in other domains. We conducted a survey of the literature to investigate how neural networks model PRS. We categorize deep learning-based approaches by their underlying architecture, highlighting their modeling assumptions, likely strengths and potential weaknesses of the architectures. Several categories of neural network architectures exhibited promising signs for the improvement of PRS' predictive power, namely sequence-based architectures, graph neural networks and those that incorporated biological knowledge. Additionally, the use of latent representations in autoencoders has improved predictive performance across diverse ancestries. However, a lack of existing model benchmarks on consistent datasets and phenotypes makes it challenging to understand the extent to which different architectures improve performance. Interpretability of deep learning-based PRS is also challenging with great care required when inferring causation. To address these challenges, we suggest the establishment and adherence to reporting standards and benchmarks to aid the development of deep learning-based PRS to find quantifiable trends in neural network architectures.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 4\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12454937/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf373\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf373","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
A survey on deep learning for polygenic risk scores.
Polygenic risk scores (PRS) combine the effects of multiple genetic variants to predict an individual's genetic predisposition to a disease. PRS typically rely on linear models, which assume that all genetic variants act independently. They often fall short in predictive accuracy and are not able to explain the genetic variability of a trait to the full extent. There is growing interest in applying deep learning neural networks to model PRS given their ability to model non-linear relationships and strong performance in other domains. We conducted a survey of the literature to investigate how neural networks model PRS. We categorize deep learning-based approaches by their underlying architecture, highlighting their modeling assumptions, likely strengths and potential weaknesses of the architectures. Several categories of neural network architectures exhibited promising signs for the improvement of PRS' predictive power, namely sequence-based architectures, graph neural networks and those that incorporated biological knowledge. Additionally, the use of latent representations in autoencoders has improved predictive performance across diverse ancestries. However, a lack of existing model benchmarks on consistent datasets and phenotypes makes it challenging to understand the extent to which different architectures improve performance. Interpretability of deep learning-based PRS is also challenging with great care required when inferring causation. To address these challenges, we suggest the establishment and adherence to reporting standards and benchmarks to aid the development of deep learning-based PRS to find quantifiable trends in neural network architectures.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.