基于机器学习的心血管疾病 10 年风险预测模型的开发:一项前瞻性队列研究。

IF 4.4 1区 医学 Q1 CLINICAL NEUROLOGY
Jia You, Yu Guo, Ju-Jiao Kang, Hui-Fu Wang, Ming Yang, Jian-Feng Feng, Jin-Tai Yu, Wei Cheng
{"title":"基于机器学习的心血管疾病 10 年风险预测模型的开发:一项前瞻性队列研究。","authors":"Jia You, Yu Guo, Ju-Jiao Kang, Hui-Fu Wang, Ming Yang, Jian-Feng Feng, Jin-Tai Yu, Wei Cheng","doi":"10.1136/svn-2023-002332","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Previous prediction algorithms for cardiovascular diseases (CVD) were established using risk factors retrieved largely based on empirical clinical knowledge. This study sought to identify predictors among a comprehensive variable space, and then employ machine learning (ML) algorithms to develop a novel CVD risk prediction model.</p><p><strong>Methods: </strong>From a longitudinal population-based cohort of UK Biobank, this study included 473 611 CVD-free participants aged between 37 and 73 years old. We implemented an ML-based data-driven pipeline to identify predictors from 645 candidate variables covering a comprehensive range of health-related factors and assessed multiple ML classifiers to establish a risk prediction model on 10-year incident CVD. The model was validated through a leave-one-center-out cross-validation.</p><p><strong>Results: </strong>During a median follow-up of 12.2 years, 31 466 participants developed CVD within 10 years after baseline visits. A novel UK Biobank CVD risk prediction (UKCRP) model was established that comprised 10 predictors including age, sex, medication of cholesterol and blood pressure, cholesterol ratio (total/high-density lipoprotein), systolic blood pressure, previous angina or heart disease, number of medications taken, cystatin C, chest pain and pack-years of smoking. Our model obtained satisfied discriminative performance with an area under the receiver operating characteristic curve (AUC) of 0.762±0.010 that outperformed multiple existing clinical models, and it was well-calibrated with a Brier Score of 0.057±0.006. Further, the UKCRP can obtain comparable performance for myocardial infarction (AUC 0.774±0.011) and ischaemic stroke (AUC 0.730±0.020), but inferior performance for haemorrhagic stroke (AUC 0.644±0.026).</p><p><strong>Conclusion: </strong>ML-based classification models can learn expressive representations from potential high-risked CVD participants who may benefit from earlier clinical decisions.</p>","PeriodicalId":22021,"journal":{"name":"Stroke and Vascular Neurology","volume":" ","pages":"475-485"},"PeriodicalIF":4.4000,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10800279/pdf/","citationCount":"0","resultStr":"{\"title\":\"Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study.\",\"authors\":\"Jia You, Yu Guo, Ju-Jiao Kang, Hui-Fu Wang, Ming Yang, Jian-Feng Feng, Jin-Tai Yu, Wei Cheng\",\"doi\":\"10.1136/svn-2023-002332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Previous prediction algorithms for cardiovascular diseases (CVD) were established using risk factors retrieved largely based on empirical clinical knowledge. This study sought to identify predictors among a comprehensive variable space, and then employ machine learning (ML) algorithms to develop a novel CVD risk prediction model.</p><p><strong>Methods: </strong>From a longitudinal population-based cohort of UK Biobank, this study included 473 611 CVD-free participants aged between 37 and 73 years old. We implemented an ML-based data-driven pipeline to identify predictors from 645 candidate variables covering a comprehensive range of health-related factors and assessed multiple ML classifiers to establish a risk prediction model on 10-year incident CVD. The model was validated through a leave-one-center-out cross-validation.</p><p><strong>Results: </strong>During a median follow-up of 12.2 years, 31 466 participants developed CVD within 10 years after baseline visits. A novel UK Biobank CVD risk prediction (UKCRP) model was established that comprised 10 predictors including age, sex, medication of cholesterol and blood pressure, cholesterol ratio (total/high-density lipoprotein), systolic blood pressure, previous angina or heart disease, number of medications taken, cystatin C, chest pain and pack-years of smoking. Our model obtained satisfied discriminative performance with an area under the receiver operating characteristic curve (AUC) of 0.762±0.010 that outperformed multiple existing clinical models, and it was well-calibrated with a Brier Score of 0.057±0.006. Further, the UKCRP can obtain comparable performance for myocardial infarction (AUC 0.774±0.011) and ischaemic stroke (AUC 0.730±0.020), but inferior performance for haemorrhagic stroke (AUC 0.644±0.026).</p><p><strong>Conclusion: </strong>ML-based classification models can learn expressive representations from potential high-risked CVD participants who may benefit from earlier clinical decisions.</p>\",\"PeriodicalId\":22021,\"journal\":{\"name\":\"Stroke and Vascular Neurology\",\"volume\":\" \",\"pages\":\"475-485\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-12-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10800279/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stroke and Vascular Neurology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/svn-2023-002332\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stroke and Vascular Neurology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/svn-2023-002332","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:以往的心血管疾病(CVD)预测算法主要是基于临床经验知识,通过检索风险因素来建立的。本研究试图在一个全面的变量空间中识别预测因子,然后采用机器学习(ML)算法开发一种新型心血管疾病风险预测模型:本研究从英国生物库的纵向人群队列中纳入了 473 611 名无心血管疾病的参与者,他们的年龄在 37 岁至 73 岁之间。我们采用基于 ML 的数据驱动管道从 645 个候选变量中识别预测因子,这些变量涵盖了一系列健康相关因素,并评估了多个 ML 分类器,从而建立了一个 10 年心血管疾病发病风险预测模型。该模型通过 "留一"-"去中心 "交叉验证进行了验证:结果:在中位 12.2 年的随访期间,31 466 名参与者在基线访问后 10 年内患上心血管疾病。英国生物库心血管疾病风险预测(UKCRP)模型由10个预测因子组成,包括年龄、性别、胆固醇和血压用药、胆固醇比率(总/高密度脂蛋白)、收缩压、既往心绞痛或心脏病、用药次数、胱抑素C、胸痛和吸烟包年。我们的模型获得了令人满意的判别性能,其接收器操作特征曲线下面积(AUC)为 0.762±0.010,优于现有的多个临床模型,并且校准良好,布赖尔得分(Brier Score)为 0.057±0.006。此外,UKCRP 在心肌梗死(AUC 0.774±0.011)和缺血性中风(AUC 0.730±0.020)方面的性能相当,但在出血性中风(AUC 0.644±0.026)方面的性能较差:结论:基于 ML 的分类模型可以从潜在的心血管疾病高危人群中学习到具有表现力的表征,这些人群可能会从更早的临床决策中获益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study.

Background: Previous prediction algorithms for cardiovascular diseases (CVD) were established using risk factors retrieved largely based on empirical clinical knowledge. This study sought to identify predictors among a comprehensive variable space, and then employ machine learning (ML) algorithms to develop a novel CVD risk prediction model.

Methods: From a longitudinal population-based cohort of UK Biobank, this study included 473 611 CVD-free participants aged between 37 and 73 years old. We implemented an ML-based data-driven pipeline to identify predictors from 645 candidate variables covering a comprehensive range of health-related factors and assessed multiple ML classifiers to establish a risk prediction model on 10-year incident CVD. The model was validated through a leave-one-center-out cross-validation.

Results: During a median follow-up of 12.2 years, 31 466 participants developed CVD within 10 years after baseline visits. A novel UK Biobank CVD risk prediction (UKCRP) model was established that comprised 10 predictors including age, sex, medication of cholesterol and blood pressure, cholesterol ratio (total/high-density lipoprotein), systolic blood pressure, previous angina or heart disease, number of medications taken, cystatin C, chest pain and pack-years of smoking. Our model obtained satisfied discriminative performance with an area under the receiver operating characteristic curve (AUC) of 0.762±0.010 that outperformed multiple existing clinical models, and it was well-calibrated with a Brier Score of 0.057±0.006. Further, the UKCRP can obtain comparable performance for myocardial infarction (AUC 0.774±0.011) and ischaemic stroke (AUC 0.730±0.020), but inferior performance for haemorrhagic stroke (AUC 0.644±0.026).

Conclusion: ML-based classification models can learn expressive representations from potential high-risked CVD participants who may benefit from earlier clinical decisions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Stroke and Vascular Neurology
Stroke and Vascular Neurology Medicine-Cardiology and Cardiovascular Medicine
CiteScore
11.20
自引率
1.70%
发文量
63
审稿时长
15 weeks
期刊介绍: Stroke and Vascular Neurology (SVN) is the official journal of the Chinese Stroke Association. Supported by a team of renowned Editors, and fully Open Access, the journal encourages debate on controversial techniques, issues on health policy and social medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信