{"title":"An interpretable machine learning model with demographic variables and dietary patterns for ASCVD identification: from U.S. NHANES 1999-2018.","authors":"Qun Tang, Yong Wang, Yan Luo","doi":"10.1186/s12911-025-02937-5","DOIUrl":null,"url":null,"abstract":"<p><p>Current research on the association between demographic variables and dietary patterns with atherosclerotic cardiovascular disease (ASCVD) is limited in breadth and depth. This study aimed to construct a machine learning (ML) algorithm that can accurately and transparently establish correlations between demographic variables, dietary habits, and ASCVD. The dataset used in this research originates from the United States National Health and Nutrition Examination Survey (U.S. NHANES) spanning 1999-2018. Five ML models were developed to predict ASCVD, and the best-performing model was selected for further analysis. The study included 40,298 participants. Using 20 population characteristics, the eXtreme Gradient Boosting (XGBoost) model demonstrated high performance, achieving an area under the curve value of 0.8143 and an accuracy of 88.4%. The model showed a positive correlation between male sex and ASCVD risk, while age and smoking also exhibited positive associations with ASCVD risk. Dairy product intake exhibited a negative correlation, while a lower intake of refined grains did not reduce the risk of ASCVD. Additionally, the poverty income ratio and calorie intake exhibited non-linear associations with the disease. The XGBoost model demonstrated significant efficacy, and precision in determining the relationship between the demographic characteristics and dietary intake of participants in the U.S. NHANES 1999-2018 dataset and ASCVD.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"105"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874124/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02937-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Current research on the association between demographic variables and dietary patterns with atherosclerotic cardiovascular disease (ASCVD) is limited in breadth and depth. This study aimed to construct a machine learning (ML) algorithm that can accurately and transparently establish correlations between demographic variables, dietary habits, and ASCVD. The dataset used in this research originates from the United States National Health and Nutrition Examination Survey (U.S. NHANES) spanning 1999-2018. Five ML models were developed to predict ASCVD, and the best-performing model was selected for further analysis. The study included 40,298 participants. Using 20 population characteristics, the eXtreme Gradient Boosting (XGBoost) model demonstrated high performance, achieving an area under the curve value of 0.8143 and an accuracy of 88.4%. The model showed a positive correlation between male sex and ASCVD risk, while age and smoking also exhibited positive associations with ASCVD risk. Dairy product intake exhibited a negative correlation, while a lower intake of refined grains did not reduce the risk of ASCVD. Additionally, the poverty income ratio and calorie intake exhibited non-linear associations with the disease. The XGBoost model demonstrated significant efficacy, and precision in determining the relationship between the demographic characteristics and dietary intake of participants in the U.S. NHANES 1999-2018 dataset and ASCVD.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.