Machine Learning Models of Polygenic Risk for Enhanced Prediction of Alzheimer Disease Endophenotypes.

IF 3 3区 医学 Q2 CLINICAL NEUROLOGY
Neurology-Genetics Pub Date : 2024-01-10 eCollection Date: 2024-02-01 DOI:10.1212/NXG.0000000000200120
Nathaniel B Gunter, Robel K Gebre, Jonathan Graff-Radford, Michael G Heckman, Clifford R Jack, Val J Lowe, David S Knopman, Ronald C Petersen, Owen A Ross, Prashanthi Vemuri, Vijay K Ramanan
{"title":"Machine Learning Models of Polygenic Risk for Enhanced Prediction of Alzheimer Disease Endophenotypes.","authors":"Nathaniel B Gunter, Robel K Gebre, Jonathan Graff-Radford, Michael G Heckman, Clifford R Jack, Val J Lowe, David S Knopman, Ronald C Petersen, Owen A Ross, Prashanthi Vemuri, Vijay K Ramanan","doi":"10.1212/NXG.0000000000200120","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Alzheimer disease (AD) has a polygenic architecture, for which genome-wide association studies (GWAS) have helped elucidate sequence variants (SVs) influencing susceptibility. Polygenic risk score (PRS) approaches show promise for generating summary measures of inherited risk for clinical AD based on the effects of <i>APOE</i> and other GWAS hits. However, existing PRS approaches, based on traditional regression models, explain only modest variation in AD dementia risk and AD-related endophenotypes. We hypothesized that machine learning (ML) models of polygenic risk (ML-PRS) could outperform standard regression-based PRS methods and therefore have the potential for greater clinical utility.</p><p><strong>Methods: </strong>We analyzed combined data from the Mayo Clinic Study of Aging (n = 1,791) and the Alzheimer's Disease Neuroimaging Initiative (n = 864). An AD PRS was computed for each participant using the top common SVs obtained from a large AD dementia GWAS. In parallel, ML models were trained using those SV genotypes, with amyloid PET burden as the primary outcome. Secondary outcomes included amyloid PET positivity and clinical diagnosis (cognitively unimpaired vs impaired). We compared performance between ML-PRS and standard PRS across 100 training sessions with different data splits. In each session, data were split into 80% training and 20% testing, and then five-fold cross-validation was used within the training set to ensure the best model was produced for testing. We also applied permutation importance techniques to assess which genetic factors contributed most to outcome prediction.</p><p><strong>Results: </strong>ML-PRS models outperformed the AD PRS (<i>r</i><sup>2</sup> = 0.28 vs <i>r</i><sup>2</sup> = 0.24 in test set) in explaining variation in amyloid PET burden. Among ML approaches, methods accounting for nonlinear genetic influences were superior to linear methods. ML-PRS models were also more accurate when predicting amyloid PET positivity (area under the curve [AUC] = 0.80 vs AUC = 0.63) and the presence of cognitive impairment (AUC = 0.75 vs AUC = 0.54) compared with the standard PRS.</p><p><strong>Discussion: </strong>We found that ML-PRS approaches improved upon standard PRS for prediction of AD endophenotypes, partly related to improved accounting for nonlinear effects of genetic susceptibility alleles. Further adaptations of the ML-PRS framework could help to close the gap of remaining unexplained heritability for AD and therefore facilitate more accurate presymptomatic and early-stage risk stratification for clinical decision-making.</p>","PeriodicalId":48613,"journal":{"name":"Neurology-Genetics","volume":"10 1","pages":"e200120"},"PeriodicalIF":3.0000,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10798228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurology-Genetics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1212/NXG.0000000000200120","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background and objectives: Alzheimer disease (AD) has a polygenic architecture, for which genome-wide association studies (GWAS) have helped elucidate sequence variants (SVs) influencing susceptibility. Polygenic risk score (PRS) approaches show promise for generating summary measures of inherited risk for clinical AD based on the effects of APOE and other GWAS hits. However, existing PRS approaches, based on traditional regression models, explain only modest variation in AD dementia risk and AD-related endophenotypes. We hypothesized that machine learning (ML) models of polygenic risk (ML-PRS) could outperform standard regression-based PRS methods and therefore have the potential for greater clinical utility.

Methods: We analyzed combined data from the Mayo Clinic Study of Aging (n = 1,791) and the Alzheimer's Disease Neuroimaging Initiative (n = 864). An AD PRS was computed for each participant using the top common SVs obtained from a large AD dementia GWAS. In parallel, ML models were trained using those SV genotypes, with amyloid PET burden as the primary outcome. Secondary outcomes included amyloid PET positivity and clinical diagnosis (cognitively unimpaired vs impaired). We compared performance between ML-PRS and standard PRS across 100 training sessions with different data splits. In each session, data were split into 80% training and 20% testing, and then five-fold cross-validation was used within the training set to ensure the best model was produced for testing. We also applied permutation importance techniques to assess which genetic factors contributed most to outcome prediction.

Results: ML-PRS models outperformed the AD PRS (r2 = 0.28 vs r2 = 0.24 in test set) in explaining variation in amyloid PET burden. Among ML approaches, methods accounting for nonlinear genetic influences were superior to linear methods. ML-PRS models were also more accurate when predicting amyloid PET positivity (area under the curve [AUC] = 0.80 vs AUC = 0.63) and the presence of cognitive impairment (AUC = 0.75 vs AUC = 0.54) compared with the standard PRS.

Discussion: We found that ML-PRS approaches improved upon standard PRS for prediction of AD endophenotypes, partly related to improved accounting for nonlinear effects of genetic susceptibility alleles. Further adaptations of the ML-PRS framework could help to close the gap of remaining unexplained heritability for AD and therefore facilitate more accurate presymptomatic and early-stage risk stratification for clinical decision-making.

建立多基因风险机器学习模型,增强对阿尔茨海默病内型的预测。
背景和目的:阿尔茨海默病(AD)具有多基因结构,全基因组关联研究(GWAS)有助于阐明影响易感性的序列变异(SV)。多基因风险评分(PRS)方法有望根据 APOE 和其他 GWAS 基因突变的影响,生成临床 AD 遗传风险的汇总指标。然而,现有的基于传统回归模型的多基因风险评分方法只能解释AD痴呆风险和AD相关内表型的微小变化。我们假设,多基因风险的机器学习(ML)模型(ML-PRS)可能优于基于回归的标准 PRS 方法,因此有可能在临床上发挥更大的作用:我们分析了梅奥诊所老龄化研究(n = 1,791)和阿尔茨海默病神经影像学倡议(n = 864)的综合数据。利用从大型阿兹海默症痴呆症 GWAS 中获得的最常见 SV,为每位参与者计算了阿兹海默症 PRS。同时,使用这些 SV 基因型训练 ML 模型,并将淀粉样蛋白 PET 负担作为主要结果。次要结果包括淀粉样蛋白 PET 阳性和临床诊断(认知功能未受损与受损)。我们比较了 ML-PRS 和标准 PRS 在 100 次不同数据分割训练中的表现。在每次训练中,数据被分成 80% 的训练集和 20% 的测试集,然后在训练集中使用五倍交叉验证,以确保为测试建立最佳模型。我们还应用了置换重要性技术来评估哪些遗传因素对结果预测的贡献最大:结果:在解释淀粉样蛋白 PET 负担的变化方面,ML-PRS 模型的表现优于 AD PRS(测试集中的 r2 = 0.28 vs r2 = 0.24)。在 ML 方法中,考虑非线性遗传影响的方法优于线性方法。与标准 PRS 相比,ML-PRS 模型在预测淀粉样蛋白 PET 阳性(曲线下面积 [AUC] = 0.80 vs AUC = 0.63)和出现认知障碍(AUC = 0.75 vs AUC = 0.54)时也更加准确:讨论:我们发现,ML-PRS方法在预测AD内型方面比标准PRS有所改进,部分原因是改进了对遗传易感性等位基因非线性效应的考虑。进一步调整ML-PRS框架有助于缩小AD尚存的无法解释的遗传率差距,从而促进临床决策中更准确的症状前和早期风险分层。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurology-Genetics
Neurology-Genetics Medicine-Neurology (clinical)
CiteScore
6.30
自引率
3.20%
发文量
107
审稿时长
15 weeks
期刊介绍: Neurology: Genetics is an online open access journal publishing peer-reviewed reports in the field of neurogenetics. Original articles in all areas of neurogenetics will be published including rare and common genetic variation, genotype-phenotype correlations, outlier phenotypes as a result of mutations in known disease-genes, and genetic variations with a putative link to diseases. This will include studies reporting on genetic disease risk and pharmacogenomics. In addition, Neurology: Genetics will publish results of gene-based clinical trials (viral, ASO, etc.). Genetically engineered model systems are not a primary focus of Neurology: Genetics, but studies using model systems for treatment trials are welcome, including well-powered studies reporting negative results.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信