{"title":"改进多基因风险评分预测的深度集合编码器网络方法","authors":"Okan B Ozdemir, Ruining Chen, Ruowang Li","doi":"10.1101/2024.07.31.24311311","DOIUrl":null,"url":null,"abstract":"Genome-wide association studies (GWAS) of various heritable human traits and diseases have identified numerous associated single nucleotide polymorphisms (SNPs), most of which have small or modest effects. Polygenic risk scores (PRS) aim to better estimate individuals' genetic predisposition by aggregating the effects of multiple SNPs from GWAS. However, current PRS is designed to capture only simple linear genetic effects across the genome, limiting their ability to fully account for the complex polygenic architecture. To address this, we propose DeepEnsembleEncodeNet (DEEN), a new method that ensembles autoencoders and fully connected neural networks (FCNNs) to better identify and model linear and non-linear SNP effects across different genomic regions, improving its ability to predict disease risks. To demonstrate DEEN's performance, we optimized the model across binary and continuous traits from the UK Biobank (UKBB). Model evaluation on the held-out UKBB testing dataset, as well as the independent All of Us (AoU) dataset, showed improved prediction and risk stratification, consistently outperforming other methods.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Ensemble Encoder Network Method for Improved Polygenic Risk Score Prediction\",\"authors\":\"Okan B Ozdemir, Ruining Chen, Ruowang Li\",\"doi\":\"10.1101/2024.07.31.24311311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genome-wide association studies (GWAS) of various heritable human traits and diseases have identified numerous associated single nucleotide polymorphisms (SNPs), most of which have small or modest effects. Polygenic risk scores (PRS) aim to better estimate individuals' genetic predisposition by aggregating the effects of multiple SNPs from GWAS. However, current PRS is designed to capture only simple linear genetic effects across the genome, limiting their ability to fully account for the complex polygenic architecture. To address this, we propose DeepEnsembleEncodeNet (DEEN), a new method that ensembles autoencoders and fully connected neural networks (FCNNs) to better identify and model linear and non-linear SNP effects across different genomic regions, improving its ability to predict disease risks. To demonstrate DEEN's performance, we optimized the model across binary and continuous traits from the UK Biobank (UKBB). Model evaluation on the held-out UKBB testing dataset, as well as the independent All of Us (AoU) dataset, showed improved prediction and risk stratification, consistently outperforming other methods.\",\"PeriodicalId\":501375,\"journal\":{\"name\":\"medRxiv - Genetic and Genomic Medicine\",\"volume\":\"80 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Genetic and Genomic Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.31.24311311\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Genetic and Genomic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.31.24311311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
针对人类各种遗传性状和疾病的全基因组关联研究(GWAS)发现了许多相关的单核苷酸多态性(SNPs),其中大多数影响较小或不大。多基因风险评分(PRS)旨在通过汇总 GWAS 中多个 SNPs 的影响,更好地估计个体的遗传易感性。然而,目前的多基因风险评分仅能捕捉整个基因组中简单的线性遗传效应,从而限制了其充分考虑复杂的多基因结构的能力。为了解决这个问题,我们提出了 DeepEnsembleEncodeNet(DEEN),这是一种将自动编码器和全连接神经网络(FCNN)组合在一起的新方法,可以更好地识别不同基因组区域的线性和非线性 SNP 效应并建立模型,从而提高预测疾病风险的能力。为了证明 DEEN 的性能,我们在英国生物库 (UKBB) 的二元和连续性状中对模型进行了优化。在英国生物库测试数据集和独立的 "我们所有人"(AoU)数据集上进行的模型评估显示,预测和风险分层能力得到了提高,始终优于其他方法。
A Deep Ensemble Encoder Network Method for Improved Polygenic Risk Score Prediction
Genome-wide association studies (GWAS) of various heritable human traits and diseases have identified numerous associated single nucleotide polymorphisms (SNPs), most of which have small or modest effects. Polygenic risk scores (PRS) aim to better estimate individuals' genetic predisposition by aggregating the effects of multiple SNPs from GWAS. However, current PRS is designed to capture only simple linear genetic effects across the genome, limiting their ability to fully account for the complex polygenic architecture. To address this, we propose DeepEnsembleEncodeNet (DEEN), a new method that ensembles autoencoders and fully connected neural networks (FCNNs) to better identify and model linear and non-linear SNP effects across different genomic regions, improving its ability to predict disease risks. To demonstrate DEEN's performance, we optimized the model across binary and continuous traits from the UK Biobank (UKBB). Model evaluation on the held-out UKBB testing dataset, as well as the independent All of Us (AoU) dataset, showed improved prediction and risk stratification, consistently outperforming other methods.