Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2024-07-22 DOI:10.1186/s12911-024-02603-2

Chien-Hsiang Cheng, Bor-Jen Lee, Oswald Ndi Nfor, Chih-Hsuan Hsiao, Yi-Chia Huang, Yung-Po Liaw

{"title":"Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors","authors":"Chien-Hsiang Cheng, Bor-Jen Lee, Oswald Ndi Nfor, Chih-Hsuan Hsiao, Yi-Chia Huang, Yung-Po Liaw","doi":"10.1186/s12911-024-02603-2","DOIUrl":null,"url":null,"abstract":"To develop and validate machine learning models for predicting coronary artery disease (CAD) within a Taiwanese cohort, with an emphasis on identifying significant predictors and comparing the performance of various models. This study involved a comprehensive analysis of clinical, demographic, and laboratory data from 8,495 subjects in Taiwan Biobank (TWB) after propensity score matching to address potential confounding factors. Key variables included age, gender, lipid profiles (T-CHO, HDL_C, LDL_C, TG), smoking and alcohol consumption habits, and renal and liver function markers. The performance of multiple machine learning models was evaluated. The cohort comprised 1,699 individuals with CAD identified through self-reported questionnaires. Significant differences were observed between CAD and non-CAD individuals regarding demographics and clinical features. Notably, the Gradient Boosting model emerged as the most accurate, achieving an AUC of 0.846 (95% confidence interval [CI] 0.819–0.873), sensitivity of 0.776 (95% CI, 0.732–0.820), and specificity of 0.759 (95% CI, 0.736–0.782), respectively. The accuracy was 0.762 (95% CI, 0.742–0.782). Age was identified as the most influential predictor of CAD risk within the studied dataset. The Gradient Boosting machine learning model demonstrated superior performance in predicting CAD within the Taiwanese cohort, with age being a critical predictor. These findings underscore the potential of machine learning models in enhancing the prediction accuracy of CAD, thereby supporting early detection and targeted intervention strategies. Not applicable.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"70 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02603-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

To develop and validate machine learning models for predicting coronary artery disease (CAD) within a Taiwanese cohort, with an emphasis on identifying significant predictors and comparing the performance of various models. This study involved a comprehensive analysis of clinical, demographic, and laboratory data from 8,495 subjects in Taiwan Biobank (TWB) after propensity score matching to address potential confounding factors. Key variables included age, gender, lipid profiles (T-CHO, HDL_C, LDL_C, TG), smoking and alcohol consumption habits, and renal and liver function markers. The performance of multiple machine learning models was evaluated. The cohort comprised 1,699 individuals with CAD identified through self-reported questionnaires. Significant differences were observed between CAD and non-CAD individuals regarding demographics and clinical features. Notably, the Gradient Boosting model emerged as the most accurate, achieving an AUC of 0.846 (95% confidence interval [CI] 0.819–0.873), sensitivity of 0.776 (95% CI, 0.732–0.820), and specificity of 0.759 (95% CI, 0.736–0.782), respectively. The accuracy was 0.762 (95% CI, 0.742–0.782). Age was identified as the most influential predictor of CAD risk within the studied dataset. The Gradient Boosting machine learning model demonstrated superior performance in predicting CAD within the Taiwanese cohort, with age being a critical predictor. These findings underscore the potential of machine learning models in enhancing the prediction accuracy of CAD, thereby supporting early detection and targeted intervention strategies. Not applicable.

查看原文本刊更多论文

使用基于机器学习的算法，根据传统和新型风险因素为台湾成年人构建心血管风险预测模型

开发并验证用于预测台湾队列中冠状动脉疾病（CAD）的机器学习模型，重点是识别重要的预测因素并比较各种模型的性能。本研究对台湾生物库（TWB）中 8495 名受试者的临床、人口统计学和实验室数据进行了全面分析，并对潜在的混杂因素进行了倾向得分匹配。主要变量包括年龄、性别、血脂概况（T-CHO、HDL_C、LDL_C、TG）、吸烟和饮酒习惯以及肝肾功能指标。对多个机器学习模型的性能进行了评估。研究对象包括 1,699 名通过自我报告问卷确认的 CAD 患者。在人口统计学和临床特征方面，观察到 CAD 患者与非 CAD 患者之间存在显著差异。值得注意的是，梯度提升模型的准确性最高，AUC 为 0.846（95% 置信区间 [CI] 0.819-0.873），灵敏度为 0.776（95% CI，0.732-0.820），特异性为 0.759（95% CI，0.736-0.782）。准确度为 0.762（95% CI，0.742-0.782）。在所研究的数据集中，年龄被认为是对 CAD 风险最有影响的预测因素。梯度提升（Gradient Boosting）机器学习模型在预测台湾队列中的 CAD 方面表现出色，而年龄是一个关键的预测因素。这些发现强调了机器学习模型在提高 CAD 预测准确性方面的潜力，从而支持早期检测和有针对性的干预策略。不适用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.