西太平洋心血管风险预测增强：针对马来西亚人口的机器学习方法。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-06-17 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0323949

Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff

{"title":"西太平洋心血管风险预测增强：针对马来西亚人口的机器学习方法。","authors":"Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff","doi":"10.1371/journal.pone.0323949","DOIUrl":null,"url":null,"abstract":"Background: Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.Objective: This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.Methods: Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).Results: The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.Conclusions: These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0323949"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12173414/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.\",\"authors\":\"Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff\",\"doi\":\"10.1371/journal.pone.0323949\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.Objective: This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.Methods: Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).Results: The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.Conclusions: These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0323949\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12173414/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0323949\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0323949","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

背景：心血管疾病（CVD）是西太平洋地区（包括马来西亚）面临的一项重大公共卫生挑战。目的：本研究旨在开发和验证机器学习（ML）模型来预测马来西亚队列的10年心血管疾病风险，这可以作为具有相似遗传和环境背景的其他亚洲人群的模型。方法：利用REDISCOVER Registry（2007 - 2017年5688名参与者）的数据，选择30个临床相关特征，并训练几种ML算法：支持向量机（SVM）、逻辑回归（LR）、随机森林（RF）、极端梯度增强（XGBoost）、神经网络（NN）和朴素贝叶斯（NB）。使用三种常用的元学习器，包括RF，广义线性模型（GLM）和梯度增强模型（GBM），创建了集成模型。数据集被分成70:30的训练-测试比例，并进行5倍交叉验证以确保稳健的性能。模型评估主要基于曲线下面积（AUC），以及其他指标，如敏感性、特异性和净重新分类指数（NRI），将ML模型与传统的风险评分（如Framingham风险评分（FRS）和修订的合并队列方程（RPCE））进行比较。结果：LR模型的AUC最高，为0.77，优于FRS （AUC = 0.72）和RPCE （AUC = 0.74）。集成模型提供了稳健的性能，尽管它没有明显超过最佳的单个模型。SHapley加性解释（SHapley Additive exPlanations）分析确定了关键的预测因素，如收缩压、体重和腰围。该研究显示，与FRS相比，NRI显著提高了13.15%，与RPCE相比提高了7.00%，突出了ML方法在马来西亚提高心血管疾病风险预测的潜力。性能最好的模型被部署在网络平台上实时使用，确保持续验证和临床适用性。结论：这些发现强调了ML模型在改善马来西亚及其他地区心血管疾病风险分层和决策方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.

查看原文本刊更多论文

Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.

Background: Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.

Objective: This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.

Methods: Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).

Results: The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.

Conclusions: These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage