Kira E. Detrois, Tuomo Hartonen, Maris Teder-Laving, Bradley Jermy, Kristi Läll, Zhiyu Yang, Estonian Biobank research team, FinnGen, Reedik Mägi, Samuli Ripatti, Andrea Ganna
{"title":"Cross-biobank generalizability and accuracy of electronic health record-based predictors compared to polygenic scores","authors":"Kira E. Detrois, Tuomo Hartonen, Maris Teder-Laving, Bradley Jermy, Kristi Läll, Zhiyu Yang, Estonian Biobank research team, FinnGen, Reedik Mägi, Samuli Ripatti, Andrea Ganna","doi":"10.1038/s41588-025-02298-9","DOIUrl":null,"url":null,"abstract":"Electronic health record (EHR)-based phenotype risk scores (PheRS) leverage individuals’ health trajectories to estimate disease risk, similar to how polygenic scores (PGS) use genetic information. While PGS generalizability has been studied, less is known about PheRS generalizability across healthcare systems and whether PheRS are complementary to PGS. We trained elastic-net-based PheRS to predict the onset of 13 common diseases for 845,929 individuals (age = 32–70 years) from three biobank-based studies in Finland (FinnGen), the UK (UKB) and Estonia (EstB). All PheRS were statistically significantly associated with the diseases of interest and most generalized well without retraining when applied to other studies. PheRS and PGS were only moderately correlated and models including both predictors improved onset prediction compared to PGS alone for 8 of 13 diseases. Our results indicate that EHR-based risk scores can transfer well between EHRs, capture largely independent information from PGS, and provide additive benefits for disease risk prediction. Comparison of electronic health record-based phenotype risk scores (PheRS) and polygenic scores (PGS) across 13 common diseases and three biobank-based studies indicates that PheRS and PGS may provide additive benefits for risk prediction.","PeriodicalId":18985,"journal":{"name":"Nature genetics","volume":"57 9","pages":"2136-2145"},"PeriodicalIF":29.0000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s41588-025-02298-9.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41588-025-02298-9","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Electronic health record (EHR)-based phenotype risk scores (PheRS) leverage individuals’ health trajectories to estimate disease risk, similar to how polygenic scores (PGS) use genetic information. While PGS generalizability has been studied, less is known about PheRS generalizability across healthcare systems and whether PheRS are complementary to PGS. We trained elastic-net-based PheRS to predict the onset of 13 common diseases for 845,929 individuals (age = 32–70 years) from three biobank-based studies in Finland (FinnGen), the UK (UKB) and Estonia (EstB). All PheRS were statistically significantly associated with the diseases of interest and most generalized well without retraining when applied to other studies. PheRS and PGS were only moderately correlated and models including both predictors improved onset prediction compared to PGS alone for 8 of 13 diseases. Our results indicate that EHR-based risk scores can transfer well between EHRs, capture largely independent information from PGS, and provide additive benefits for disease risk prediction. Comparison of electronic health record-based phenotype risk scores (PheRS) and polygenic scores (PGS) across 13 common diseases and three biobank-based studies indicates that PheRS and PGS may provide additive benefits for risk prediction.
期刊介绍:
Nature Genetics publishes the very highest quality research in genetics. It encompasses genetic and functional genomic studies on human and plant traits and on other model organisms. Current emphasis is on the genetic basis for common and complex diseases and on the functional mechanism, architecture and evolution of gene networks, studied by experimental perturbation.
Integrative genetic topics comprise, but are not limited to:
-Genes in the pathology of human disease
-Molecular analysis of simple and complex genetic traits
-Cancer genetics
-Agricultural genomics
-Developmental genetics
-Regulatory variation in gene expression
-Strategies and technologies for extracting function from genomic data
-Pharmacological genomics
-Genome evolution