Min Seo Kim, Shaan Khurshid, Shinwan Kany, Lu-Chen Weng, Sarah Urbut, Carolina Roselli, Leonoor Wijdeveld, Sean J Jurgens, Joel T Rämö, Patrick T Ellinor, Akl C Fahed
{"title":"基于机器学习的血浆蛋白风险评分在临床和基因组模型上改善房颤预测。","authors":"Min Seo Kim, Shaan Khurshid, Shinwan Kany, Lu-Chen Weng, Sarah Urbut, Carolina Roselli, Leonoor Wijdeveld, Sean J Jurgens, Joel T Rämö, Patrick T Ellinor, Akl C Fahed","doi":"10.1161/CIRCGEN.124.004943","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clinical factors discriminate incident atrial fibrillation (AF) risk with moderate accuracy, with only modest improvement after incorporation of polygenic risk scores. Whether emerging large-scale proteomic profiling can augment AF risk estimation is unknown.</p><p><strong>Methods: </strong>In the UK Biobank cohort, we derived and validated a machine learning model to predict incident AF risk using serum proteins (Pro-AF). We compared Pro-AF to a validated clinical risk score (Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation) and an AF polygenic risk score. Models were evaluated in a multiply resampled test set from nested cross-validation (internal test set), and a sample of UK Biobank participants separate from model development (hold-out test set). Metrics included discrimination of 5-year incident AF using time-dependent area under the receiver operating characteristic curve and net reclassification.</p><p><strong>Results: </strong>Trained in 32 631 UK Biobank participants, Pro-AF predicts incident AF using 121 protein levels (out of 2911 protein analytes). When assessed in the internal test set comprising 30 632 individuals (mean age 57±8 years, 54% women, 2045 AF events) and hold-out test set comprising 13 998 individuals (mean age 57±8 years, 54% women, 870 AF events), discrimination of 5-year incident AF was highest using Pro-AF (area under the receiver operating characteristic curve internal: 0.761 [95% CI, 0.745-0.780], hold-out: 0.763 [0.734-0.784]), followed by Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (0.719 [0.700-0.737]; 0.702 [0.668-0.730]) and the polygenic risk score (0.686 [0.668-0.702]; 0.682 [0.660-0.710]). AF risk estimates were well-calibrated, and the addition of Pro-AF led to substantial continuous net reclassification improvement over Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (eg, internal test set 0.410 [0.330-0.492]). A simplified Pro-AF including only the 5 most influential proteins (NT-proBNP, EDA2R [ectodysplasin A2 receptor], NPPB [B-type natriuretic peptide], BCAN [brevican core protein], and GDF15 [growth/differentiation factor 15]), retained favorable discriminative value (area under the receiver operating characteristic curve internal: 0.750 [0.733-0.768]; hold-out: 0.759 [0.732-0.790]).</p><p><strong>Conclusions: </strong>A machine learning-based protein score discriminates 5-year incident AF risk favorably compared with clinical and genetic risk factors. Large-scale proteomic analysis may assist in the prioritization of individuals at risk for AF for screening and related preventive interventions.</p>","PeriodicalId":10326,"journal":{"name":"Circulation: Genomic and Precision Medicine","volume":" ","pages":"e004943"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257488/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Based Plasma Protein Risk Score Improves Atrial Fibrillation Prediction Over Clinical and Genomic Models.\",\"authors\":\"Min Seo Kim, Shaan Khurshid, Shinwan Kany, Lu-Chen Weng, Sarah Urbut, Carolina Roselli, Leonoor Wijdeveld, Sean J Jurgens, Joel T Rämö, Patrick T Ellinor, Akl C Fahed\",\"doi\":\"10.1161/CIRCGEN.124.004943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Clinical factors discriminate incident atrial fibrillation (AF) risk with moderate accuracy, with only modest improvement after incorporation of polygenic risk scores. Whether emerging large-scale proteomic profiling can augment AF risk estimation is unknown.</p><p><strong>Methods: </strong>In the UK Biobank cohort, we derived and validated a machine learning model to predict incident AF risk using serum proteins (Pro-AF). We compared Pro-AF to a validated clinical risk score (Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation) and an AF polygenic risk score. Models were evaluated in a multiply resampled test set from nested cross-validation (internal test set), and a sample of UK Biobank participants separate from model development (hold-out test set). Metrics included discrimination of 5-year incident AF using time-dependent area under the receiver operating characteristic curve and net reclassification.</p><p><strong>Results: </strong>Trained in 32 631 UK Biobank participants, Pro-AF predicts incident AF using 121 protein levels (out of 2911 protein analytes). When assessed in the internal test set comprising 30 632 individuals (mean age 57±8 years, 54% women, 2045 AF events) and hold-out test set comprising 13 998 individuals (mean age 57±8 years, 54% women, 870 AF events), discrimination of 5-year incident AF was highest using Pro-AF (area under the receiver operating characteristic curve internal: 0.761 [95% CI, 0.745-0.780], hold-out: 0.763 [0.734-0.784]), followed by Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (0.719 [0.700-0.737]; 0.702 [0.668-0.730]) and the polygenic risk score (0.686 [0.668-0.702]; 0.682 [0.660-0.710]). AF risk estimates were well-calibrated, and the addition of Pro-AF led to substantial continuous net reclassification improvement over Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (eg, internal test set 0.410 [0.330-0.492]). A simplified Pro-AF including only the 5 most influential proteins (NT-proBNP, EDA2R [ectodysplasin A2 receptor], NPPB [B-type natriuretic peptide], BCAN [brevican core protein], and GDF15 [growth/differentiation factor 15]), retained favorable discriminative value (area under the receiver operating characteristic curve internal: 0.750 [0.733-0.768]; hold-out: 0.759 [0.732-0.790]).</p><p><strong>Conclusions: </strong>A machine learning-based protein score discriminates 5-year incident AF risk favorably compared with clinical and genetic risk factors. Large-scale proteomic analysis may assist in the prioritization of individuals at risk for AF for screening and related preventive interventions.</p>\",\"PeriodicalId\":10326,\"journal\":{\"name\":\"Circulation: Genomic and Precision Medicine\",\"volume\":\" \",\"pages\":\"e004943\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257488/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Circulation: Genomic and Precision Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1161/CIRCGEN.124.004943\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Circulation: Genomic and Precision Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1161/CIRCGEN.124.004943","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
Machine Learning-Based Plasma Protein Risk Score Improves Atrial Fibrillation Prediction Over Clinical and Genomic Models.
Background: Clinical factors discriminate incident atrial fibrillation (AF) risk with moderate accuracy, with only modest improvement after incorporation of polygenic risk scores. Whether emerging large-scale proteomic profiling can augment AF risk estimation is unknown.
Methods: In the UK Biobank cohort, we derived and validated a machine learning model to predict incident AF risk using serum proteins (Pro-AF). We compared Pro-AF to a validated clinical risk score (Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation) and an AF polygenic risk score. Models were evaluated in a multiply resampled test set from nested cross-validation (internal test set), and a sample of UK Biobank participants separate from model development (hold-out test set). Metrics included discrimination of 5-year incident AF using time-dependent area under the receiver operating characteristic curve and net reclassification.
Results: Trained in 32 631 UK Biobank participants, Pro-AF predicts incident AF using 121 protein levels (out of 2911 protein analytes). When assessed in the internal test set comprising 30 632 individuals (mean age 57±8 years, 54% women, 2045 AF events) and hold-out test set comprising 13 998 individuals (mean age 57±8 years, 54% women, 870 AF events), discrimination of 5-year incident AF was highest using Pro-AF (area under the receiver operating characteristic curve internal: 0.761 [95% CI, 0.745-0.780], hold-out: 0.763 [0.734-0.784]), followed by Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (0.719 [0.700-0.737]; 0.702 [0.668-0.730]) and the polygenic risk score (0.686 [0.668-0.702]; 0.682 [0.660-0.710]). AF risk estimates were well-calibrated, and the addition of Pro-AF led to substantial continuous net reclassification improvement over Cohorts for Aging and Genomic Epidemiology-Atrial Fibrillation (eg, internal test set 0.410 [0.330-0.492]). A simplified Pro-AF including only the 5 most influential proteins (NT-proBNP, EDA2R [ectodysplasin A2 receptor], NPPB [B-type natriuretic peptide], BCAN [brevican core protein], and GDF15 [growth/differentiation factor 15]), retained favorable discriminative value (area under the receiver operating characteristic curve internal: 0.750 [0.733-0.768]; hold-out: 0.759 [0.732-0.790]).
Conclusions: A machine learning-based protein score discriminates 5-year incident AF risk favorably compared with clinical and genetic risk factors. Large-scale proteomic analysis may assist in the prioritization of individuals at risk for AF for screening and related preventive interventions.
期刊介绍:
Circulation: Genomic and Precision Medicine is a distinguished journal dedicated to advancing the frontiers of cardiovascular genomics and precision medicine. It publishes a diverse array of original research articles that delve into the genetic and molecular underpinnings of cardiovascular diseases. The journal's scope is broad, encompassing studies from human subjects to laboratory models, and from in vitro experiments to computational simulations.
Circulation: Genomic and Precision Medicine is committed to publishing studies that have direct relevance to human cardiovascular biology and disease, with the ultimate goal of improving patient care and outcomes. The journal serves as a platform for researchers to share their groundbreaking work, fostering collaboration and innovation in the field of cardiovascular genomics and precision medicine.