基于机器学习的动脉粥样硬化性心血管疾病生活方式危险因素分析:回顾性病例对照研究。

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Hye-Jin Kim, Heeji Choi, Hyo-Jung Ahn, Seung-Ho Shin, Chulho Kim, Sang-Hwa Lee, Jong-Hee Sohn, Jae-Jun Lee
{"title":"基于机器学习的动脉粥样硬化性心血管疾病生活方式危险因素分析:回顾性病例对照研究。","authors":"Hye-Jin Kim, Heeji Choi, Hyo-Jung Ahn, Seung-Ho Shin, Chulho Kim, Sang-Hwa Lee, Jong-Hee Sohn, Jae-Jun Lee","doi":"10.2196/74415","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases.</p><p><strong>Objective: </strong>We aimed to assess the predictive accuracy of machine learning (ML) models incorporating lifestyle risk behaviors for ASCVD risk using the Korean nationwide database.</p><p><strong>Methods: </strong>Using data from the Korea National Health and Nutrition Examination Survey, 5 ML algorithms were used for the prediction of high ASCVD risk: logistic regression (LR), support vector machine, random forest, extreme gradient boosting, and light gradient boosting models. ASCVD risk was assessed using the pooled cohort equations, with a high-risk threshold of ≥7.5% over 10 years. Among the 8573 participants aged 40-79 years, propensity score matching (PSM) was used to adjust for demographic confounders. We divided the dataset into a training and a test dataset in an 8:2 ratio. We also used bootstrapping to train the ML model with the area under the receiver operating characteristics curve score. Shapley additive explanations were used to identify the models' important variables in assessing high ASCVD risks. In sensitivity analysis, we additionally performed binary LR analysis, in which the ML model's results were consistent with the conventional statistical model.</p><p><strong>Results: </strong>Of the 8573 participants, 41.7% (n=3578) had high ASCVD risk. Before PSM, age and sex differed significantly between groups. PSM (1:1) yielded 1976 patients with balanced demographics. After PSM, the high ASCVD risk group had higher alcohol or tobacco use, lower omega-3 intake, higher BMI, less physical activity, and spent less time sitting. In 5 ML models, the extreme gradient boosting model showed the highest area under the receiver operating characteristics curve, indicating superior overall discrimination between high and low ASCVD risk groups. However, the light gradient boosting model demonstrated better performance in accuracy, recall, and F1-score. Variable importance analysis using Shapley additive explanations identified smoking and age as the strongest predictors, while BMI, sodium or omega-3 intake, and low-density lipoprotein cholesterol also had significant variables. Sensitivity analysis using multivariable LR analysis also confirmed these findings, showing that smoking, BMI, and low-density lipoprotein cholesterol increased ASCVD risk, whereas omega-3 intake and physical activity were associated with lower risk.</p><p><strong>Conclusions: </strong>Analyzing lifestyle behavioral factors in ASCVD risk with an ML model improves the predictive performance compared to traditional models. Personalized prevention strategies tailored to an individual's lifestyle can effectively reduce ASCVD risk.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e74415"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330983/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.\",\"authors\":\"Hye-Jin Kim, Heeji Choi, Hyo-Jung Ahn, Seung-Ho Shin, Chulho Kim, Sang-Hwa Lee, Jong-Hee Sohn, Jae-Jun Lee\",\"doi\":\"10.2196/74415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases.</p><p><strong>Objective: </strong>We aimed to assess the predictive accuracy of machine learning (ML) models incorporating lifestyle risk behaviors for ASCVD risk using the Korean nationwide database.</p><p><strong>Methods: </strong>Using data from the Korea National Health and Nutrition Examination Survey, 5 ML algorithms were used for the prediction of high ASCVD risk: logistic regression (LR), support vector machine, random forest, extreme gradient boosting, and light gradient boosting models. ASCVD risk was assessed using the pooled cohort equations, with a high-risk threshold of ≥7.5% over 10 years. Among the 8573 participants aged 40-79 years, propensity score matching (PSM) was used to adjust for demographic confounders. We divided the dataset into a training and a test dataset in an 8:2 ratio. We also used bootstrapping to train the ML model with the area under the receiver operating characteristics curve score. Shapley additive explanations were used to identify the models' important variables in assessing high ASCVD risks. In sensitivity analysis, we additionally performed binary LR analysis, in which the ML model's results were consistent with the conventional statistical model.</p><p><strong>Results: </strong>Of the 8573 participants, 41.7% (n=3578) had high ASCVD risk. Before PSM, age and sex differed significantly between groups. PSM (1:1) yielded 1976 patients with balanced demographics. After PSM, the high ASCVD risk group had higher alcohol or tobacco use, lower omega-3 intake, higher BMI, less physical activity, and spent less time sitting. In 5 ML models, the extreme gradient boosting model showed the highest area under the receiver operating characteristics curve, indicating superior overall discrimination between high and low ASCVD risk groups. However, the light gradient boosting model demonstrated better performance in accuracy, recall, and F1-score. Variable importance analysis using Shapley additive explanations identified smoking and age as the strongest predictors, while BMI, sodium or omega-3 intake, and low-density lipoprotein cholesterol also had significant variables. Sensitivity analysis using multivariable LR analysis also confirmed these findings, showing that smoking, BMI, and low-density lipoprotein cholesterol increased ASCVD risk, whereas omega-3 intake and physical activity were associated with lower risk.</p><p><strong>Conclusions: </strong>Analyzing lifestyle behavioral factors in ASCVD risk with an ML model improves the predictive performance compared to traditional models. Personalized prevention strategies tailored to an individual's lifestyle can effectively reduce ASCVD risk.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e74415\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330983/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/74415\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/74415","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:发生动脉粥样硬化性心血管疾病(ASCVD)的风险因人而异,除了存在慢性疾病外,还与各种生活方式因素有关。目的:我们旨在利用韩国国家数据库评估纳入生活方式风险行为的机器学习(ML)模型对ASCVD风险的预测准确性。方法:利用韩国国民健康与营养调查数据,采用5 ML算法预测ASCVD高风险:logistic回归(LR)、支持向量机、随机森林、极端梯度增强和轻梯度增强模型。采用合并队列方程评估ASCVD风险,10年高风险阈值≥7.5%。在8573名年龄在40-79岁之间的参与者中,使用倾向得分匹配(PSM)来调整人口统计学混杂因素。我们以8:2的比例将数据集分为训练数据集和测试数据集。我们还使用了bootstrapping来训练ML模型,该模型具有接收者工作特征曲线得分下的面积。Shapley加性解释用于确定模型中评估ASCVD高风险的重要变量。在敏感性分析中,我们还进行了二元LR分析,其中ML模型的结果与传统统计模型一致。结果:在8573名参与者中,41.7% (n=3578)有高ASCVD风险。PSM前,组间年龄、性别差异显著。PSM(1:1)产生了1976例人口统计学平衡的患者。在PSM之后,高ASCVD风险组有更高的酒精或烟草使用,更低的omega-3摄入量,更高的BMI,更少的体育活动,坐着的时间更少。在5 ML模型中,极端梯度增强模型在受试者工作特征曲线下的面积最大,表明高、低ASCVD风险组的整体区分能力较强。然而,光梯度增强模型在准确率、召回率和f1得分方面表现出更好的性能。使用Shapley加性解释的变量重要性分析发现,吸烟和年龄是最强的预测因素,而BMI、钠或omega-3摄入量和低密度脂蛋白胆固醇也有显著的变量。使用多变量LR分析的敏感性分析也证实了这些发现,表明吸烟、BMI和低密度脂蛋白胆固醇增加ASCVD风险,而omega-3摄入量和体育活动与风险较低相关。结论:与传统模型相比,用ML模型分析生活方式行为因素对ASCVD风险的预测效果更好。针对个人生活方式量身定制的个性化预防策略可以有效降低ASCVD风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Machine Learning-Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study.

Background: The risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases.

Objective: We aimed to assess the predictive accuracy of machine learning (ML) models incorporating lifestyle risk behaviors for ASCVD risk using the Korean nationwide database.

Methods: Using data from the Korea National Health and Nutrition Examination Survey, 5 ML algorithms were used for the prediction of high ASCVD risk: logistic regression (LR), support vector machine, random forest, extreme gradient boosting, and light gradient boosting models. ASCVD risk was assessed using the pooled cohort equations, with a high-risk threshold of ≥7.5% over 10 years. Among the 8573 participants aged 40-79 years, propensity score matching (PSM) was used to adjust for demographic confounders. We divided the dataset into a training and a test dataset in an 8:2 ratio. We also used bootstrapping to train the ML model with the area under the receiver operating characteristics curve score. Shapley additive explanations were used to identify the models' important variables in assessing high ASCVD risks. In sensitivity analysis, we additionally performed binary LR analysis, in which the ML model's results were consistent with the conventional statistical model.

Results: Of the 8573 participants, 41.7% (n=3578) had high ASCVD risk. Before PSM, age and sex differed significantly between groups. PSM (1:1) yielded 1976 patients with balanced demographics. After PSM, the high ASCVD risk group had higher alcohol or tobacco use, lower omega-3 intake, higher BMI, less physical activity, and spent less time sitting. In 5 ML models, the extreme gradient boosting model showed the highest area under the receiver operating characteristics curve, indicating superior overall discrimination between high and low ASCVD risk groups. However, the light gradient boosting model demonstrated better performance in accuracy, recall, and F1-score. Variable importance analysis using Shapley additive explanations identified smoking and age as the strongest predictors, while BMI, sodium or omega-3 intake, and low-density lipoprotein cholesterol also had significant variables. Sensitivity analysis using multivariable LR analysis also confirmed these findings, showing that smoking, BMI, and low-density lipoprotein cholesterol increased ASCVD risk, whereas omega-3 intake and physical activity were associated with lower risk.

Conclusions: Analyzing lifestyle behavioral factors in ASCVD risk with an ML model improves the predictive performance compared to traditional models. Personalized prevention strategies tailored to an individual's lifestyle can effectively reduce ASCVD risk.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信