Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors
{"title":"Using machine learning-based algorithms to construct cardiovascular risk prediction models for Taiwanese adults based on traditional and novel risk factors","authors":"Chien-Hsiang Cheng, Bor-Jen Lee, Oswald Ndi Nfor, Chih-Hsuan Hsiao, Yi-Chia Huang, Yung-Po Liaw","doi":"10.1186/s12911-024-02603-2","DOIUrl":null,"url":null,"abstract":"To develop and validate machine learning models for predicting coronary artery disease (CAD) within a Taiwanese cohort, with an emphasis on identifying significant predictors and comparing the performance of various models. This study involved a comprehensive analysis of clinical, demographic, and laboratory data from 8,495 subjects in Taiwan Biobank (TWB) after propensity score matching to address potential confounding factors. Key variables included age, gender, lipid profiles (T-CHO, HDL_C, LDL_C, TG), smoking and alcohol consumption habits, and renal and liver function markers. The performance of multiple machine learning models was evaluated. The cohort comprised 1,699 individuals with CAD identified through self-reported questionnaires. Significant differences were observed between CAD and non-CAD individuals regarding demographics and clinical features. Notably, the Gradient Boosting model emerged as the most accurate, achieving an AUC of 0.846 (95% confidence interval [CI] 0.819–0.873), sensitivity of 0.776 (95% CI, 0.732–0.820), and specificity of 0.759 (95% CI, 0.736–0.782), respectively. The accuracy was 0.762 (95% CI, 0.742–0.782). Age was identified as the most influential predictor of CAD risk within the studied dataset. The Gradient Boosting machine learning model demonstrated superior performance in predicting CAD within the Taiwanese cohort, with age being a critical predictor. These findings underscore the potential of machine learning models in enhancing the prediction accuracy of CAD, thereby supporting early detection and targeted intervention strategies. Not applicable.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"70 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02603-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
To develop and validate machine learning models for predicting coronary artery disease (CAD) within a Taiwanese cohort, with an emphasis on identifying significant predictors and comparing the performance of various models. This study involved a comprehensive analysis of clinical, demographic, and laboratory data from 8,495 subjects in Taiwan Biobank (TWB) after propensity score matching to address potential confounding factors. Key variables included age, gender, lipid profiles (T-CHO, HDL_C, LDL_C, TG), smoking and alcohol consumption habits, and renal and liver function markers. The performance of multiple machine learning models was evaluated. The cohort comprised 1,699 individuals with CAD identified through self-reported questionnaires. Significant differences were observed between CAD and non-CAD individuals regarding demographics and clinical features. Notably, the Gradient Boosting model emerged as the most accurate, achieving an AUC of 0.846 (95% confidence interval [CI] 0.819–0.873), sensitivity of 0.776 (95% CI, 0.732–0.820), and specificity of 0.759 (95% CI, 0.736–0.782), respectively. The accuracy was 0.762 (95% CI, 0.742–0.782). Age was identified as the most influential predictor of CAD risk within the studied dataset. The Gradient Boosting machine learning model demonstrated superior performance in predicting CAD within the Taiwanese cohort, with age being a critical predictor. These findings underscore the potential of machine learning models in enhancing the prediction accuracy of CAD, thereby supporting early detection and targeted intervention strategies. Not applicable.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.