{"title":"A Deep Ensemble Encoder Network Method for Improved Polygenic Risk Score Prediction","authors":"Okan B Ozdemir, Ruining Chen, Ruowang Li","doi":"10.1101/2024.07.31.24311311","DOIUrl":null,"url":null,"abstract":"Genome-wide association studies (GWAS) of various heritable human traits and diseases have identified numerous associated single nucleotide polymorphisms (SNPs), most of which have small or modest effects. Polygenic risk scores (PRS) aim to better estimate individuals' genetic predisposition by aggregating the effects of multiple SNPs from GWAS. However, current PRS is designed to capture only simple linear genetic effects across the genome, limiting their ability to fully account for the complex polygenic architecture. To address this, we propose DeepEnsembleEncodeNet (DEEN), a new method that ensembles autoencoders and fully connected neural networks (FCNNs) to better identify and model linear and non-linear SNP effects across different genomic regions, improving its ability to predict disease risks. To demonstrate DEEN's performance, we optimized the model across binary and continuous traits from the UK Biobank (UKBB). Model evaluation on the held-out UKBB testing dataset, as well as the independent All of Us (AoU) dataset, showed improved prediction and risk stratification, consistently outperforming other methods.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Genetic and Genomic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.31.24311311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Genome-wide association studies (GWAS) of various heritable human traits and diseases have identified numerous associated single nucleotide polymorphisms (SNPs), most of which have small or modest effects. Polygenic risk scores (PRS) aim to better estimate individuals' genetic predisposition by aggregating the effects of multiple SNPs from GWAS. However, current PRS is designed to capture only simple linear genetic effects across the genome, limiting their ability to fully account for the complex polygenic architecture. To address this, we propose DeepEnsembleEncodeNet (DEEN), a new method that ensembles autoencoders and fully connected neural networks (FCNNs) to better identify and model linear and non-linear SNP effects across different genomic regions, improving its ability to predict disease risks. To demonstrate DEEN's performance, we optimized the model across binary and continuous traits from the UK Biobank (UKBB). Model evaluation on the held-out UKBB testing dataset, as well as the independent All of Us (AoU) dataset, showed improved prediction and risk stratification, consistently outperforming other methods.