Machine learning-based risk factor analysis and prediction model construction for mortality in chronic heart failure.

IF 4.3 3区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Qian Xu, Ruicong Yu, Xue Cai, Guanjie Chen, Yueyue Zheng, Cuirong Xu, Jing Sun
{"title":"Machine learning-based risk factor analysis and prediction model construction for mortality in chronic heart failure.","authors":"Qian Xu, Ruicong Yu, Xue Cai, Guanjie Chen, Yueyue Zheng, Cuirong Xu, Jing Sun","doi":"10.7189/jogh.15.04242","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Given the high global mortality burden of chronic heart failure (CHF) and the limitations of traditional risk prediction tools in accuracy and comprehensiveness, along with the potential of machine learning (ML) to improve prediction performance and the ability of a health ecology framework to systematically identify multi-dimensional risk factors, we aimed to develop an ML-based mortality risk prediction model for CHF and analyse its risk factors using a health ecology framework.</p><p><strong>Methods: </strong>We enrolled 489 CHF patients from the Jackson Heart Database, with all-cause mortality during a 10-year follow-up period designated as the outcome measure. Guided by a five-layer health ecology framework (individual traits, behavioural characteristics, interpersonal relationships, work/living conditions, and macro policies), we selected 58 variables for analysis. The cohort was split into 7:3 training/validation sets. Random forest (RF) and k-nearest neighbour (KNN) models identified mortality predictors after five oversampling techniques addressed data imbalance before modelling. We trained seven ML algorithms, validated them via 10-fold cross-validation, and compared them using accuracy, the area under the curve (AUC), and other metrics.</p><p><strong>Results: </strong>We identified 24 key factors: 19 for individual traits (age, body mass index (BMI), antihypertensive medication, hypoglycaemic medication, antiarrhythmic medication, systolic blood pressure, glycated haemoglobin, glomerular filtration rate, left ventricular ejection fraction, left ventricular diastolic diameter, left ventricular mass, high-density lipoproteins, low-density lipoproteins, triglycerides, total cholesterol, cardiovascular surgical history, mitral annular early diastolic peak velocity of motion); three for individual behavioural characteristics (dark greens intake, egg intake, and night-time sleep duration); and two for living and working conditions (favourite food shop at three-kilometre radius, proportion of poor people in the place of residence). The model constructed using synthetic minority over-sampling technique combined with edited nearest neighbours (SMOTE-ENN) processing and applying extreme gradient boosting (XGBoost) model was optimal, with an accuracy of 81.58%, an AUC value of 0.83, a precision of 0.87, a recall of 0.84, and an F1 value of 0.86 for the prediction of mortality at 10-year follow up.</p><p><strong>Conclusions: </strong>We systematically categorised CHF mortality risk factors by integrating health ecology theory and ML. The SMOTE-ENN and XGBoost model demonstrated high accuracy, though further optimisation is needed to enhance clinical utility in CHF risk prediction.</p>","PeriodicalId":48734,"journal":{"name":"Journal of Global Health","volume":"15 ","pages":"04242"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12427600/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Global Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7189/jogh.15.04242","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Given the high global mortality burden of chronic heart failure (CHF) and the limitations of traditional risk prediction tools in accuracy and comprehensiveness, along with the potential of machine learning (ML) to improve prediction performance and the ability of a health ecology framework to systematically identify multi-dimensional risk factors, we aimed to develop an ML-based mortality risk prediction model for CHF and analyse its risk factors using a health ecology framework.

Methods: We enrolled 489 CHF patients from the Jackson Heart Database, with all-cause mortality during a 10-year follow-up period designated as the outcome measure. Guided by a five-layer health ecology framework (individual traits, behavioural characteristics, interpersonal relationships, work/living conditions, and macro policies), we selected 58 variables for analysis. The cohort was split into 7:3 training/validation sets. Random forest (RF) and k-nearest neighbour (KNN) models identified mortality predictors after five oversampling techniques addressed data imbalance before modelling. We trained seven ML algorithms, validated them via 10-fold cross-validation, and compared them using accuracy, the area under the curve (AUC), and other metrics.

Results: We identified 24 key factors: 19 for individual traits (age, body mass index (BMI), antihypertensive medication, hypoglycaemic medication, antiarrhythmic medication, systolic blood pressure, glycated haemoglobin, glomerular filtration rate, left ventricular ejection fraction, left ventricular diastolic diameter, left ventricular mass, high-density lipoproteins, low-density lipoproteins, triglycerides, total cholesterol, cardiovascular surgical history, mitral annular early diastolic peak velocity of motion); three for individual behavioural characteristics (dark greens intake, egg intake, and night-time sleep duration); and two for living and working conditions (favourite food shop at three-kilometre radius, proportion of poor people in the place of residence). The model constructed using synthetic minority over-sampling technique combined with edited nearest neighbours (SMOTE-ENN) processing and applying extreme gradient boosting (XGBoost) model was optimal, with an accuracy of 81.58%, an AUC value of 0.83, a precision of 0.87, a recall of 0.84, and an F1 value of 0.86 for the prediction of mortality at 10-year follow up.

Conclusions: We systematically categorised CHF mortality risk factors by integrating health ecology theory and ML. The SMOTE-ENN and XGBoost model demonstrated high accuracy, though further optimisation is needed to enhance clinical utility in CHF risk prediction.

Abstract Image

Abstract Image

Abstract Image

基于机器学习的慢性心力衰竭死亡率危险因素分析及预测模型构建。
背景:考虑到慢性心力衰竭(CHF)的高全球死亡率负担和传统风险预测工具在准确性和全面性方面的局限性,以及机器学习(ML)提高预测性能和健康生态框架系统识别多维风险因素的能力的潜力,我们旨在开发基于机器学习的CHF死亡风险预测模型,并使用健康生态框架分析其风险因素。方法:我们从Jackson心脏数据库中招募了489名CHF患者,在10年的随访期间,全因死亡率被指定为结果测量指标。在五层健康生态框架(个体特征、行为特征、人际关系、工作/生活条件和宏观政策)的指导下,我们选择了58个变量进行分析。队列被分成7:3的训练/验证集。随机森林(RF)和k近邻(KNN)模型在建模之前通过五种过采样技术解决了数据不平衡问题,确定了死亡率预测因子。我们训练了7种ML算法,通过10倍交叉验证对它们进行了验证,并使用准确率、曲线下面积(AUC)和其他指标对它们进行了比较。结果:我们确定了24个关键因素:19项针对个体特征(年龄、体重指数(BMI)、降压药物、降糖药物、抗心律失常药物、收缩压、糖化血红蛋白、肾小球滤过率、左心室射血分数、左心室舒张直径、左心室质量、高密度脂蛋白、低密度脂蛋白、甘油三酯、总胆固醇、心血管手术史、二尖瓣环舒张早期运动峰值速度);三个是个人行为特征(深绿色蔬菜摄入量、鸡蛋摄入量和夜间睡眠时间);两项是生活和工作条件(三公里半径内最受欢迎的食品店,居住地贫困人口比例)。结合编辑近邻(SMOTE-ENN)处理和极端梯度增强(XGBoost)模型构建的模型最优,预测10年随访死亡率的准确率为81.58%,AUC值为0.83,精度为0.87,召回率为0.84,F1值为0.86。结论:通过整合健康生态学理论和ML,我们系统地对CHF死亡危险因素进行了分类。SMOTE-ENN和XGBoost模型显示出较高的准确性,但需要进一步优化以提高CHF风险预测的临床实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Global Health
Journal of Global Health PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH -
CiteScore
6.10
自引率
2.80%
发文量
240
审稿时长
6 weeks
期刊介绍: Journal of Global Health is a peer-reviewed journal published by the Edinburgh University Global Health Society, a not-for-profit organization registered in the UK. We publish editorials, news, viewpoints, original research and review articles in two issues per year.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信