Predicting functional outcome in ischemic stroke patients using genetic, environmental, and clinical factors: a machine learning analysis of population-based prospective cohort study.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Siding Chen, Zhe Xu, Jinfeng Yin, Hongqiu Gu, Yanfeng Shi, Cang Guo, Xia Meng, Hao Li, Xinying Huang, Yong Jiang, Yongjun Wang
{"title":"Predicting functional outcome in ischemic stroke patients using genetic, environmental, and clinical factors: a machine learning analysis of population-based prospective cohort study.","authors":"Siding Chen, Zhe Xu, Jinfeng Yin, Hongqiu Gu, Yanfeng Shi, Cang Guo, Xia Meng, Hao Li, Xinying Huang, Yong Jiang, Yongjun Wang","doi":"10.1093/bib/bbae487","DOIUrl":null,"url":null,"abstract":"<p><p>Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471838/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae487","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.

利用遗传、环境和临床因素预测缺血性中风患者的功能预后:基于人群的前瞻性队列研究的机器学习分析。
缺血性中风(IS)是导致成人残疾的主要原因,会严重影响患者的生活质量。准确预测缺血性脑卒中的功能预后对于精确的风险分层和有效的治疗干预至关重要。我们利用第三期中国全国脑卒中登记中 7819 例 IS 患者的数据,建立了一个综合了遗传、环境和临床因素的预测模型。我们采用 80:20 的比例将数据集随机分为开发队列和内部验证队列。在内部验证队列中,我们使用接收者操作特征曲线下面积(AUC)评估了模型的判别和校准性能,并使用校准曲线评估了Brier评分。我们在开发队列中进行了全基因组关联研究(GWAS),发现rs11109607(ANKS1B)是与IS功能结果相关的最重要变异。我们采用主成分分析法降低了 GWAS 确定的前 100 个重要变异的维度,将它们作为遗传因素纳入预测模型。我们采用了一种能够识别非线性关系的机器学习算法来建立 IS 患者功能预后的预测模型。最佳模型是 XGBoost 模型,它优于逻辑回归模型(AUC 0.818 对 0.756,P
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信