An interpretable machine learning model with demographic variables and dietary patterns for ASCVD identification: from U.S. NHANES 1999-2018.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-03-03 DOI:10.1186/s12911-025-02937-5

Qun Tang, Yong Wang, Yan Luo

{"title":"An interpretable machine learning model with demographic variables and dietary patterns for ASCVD identification: from U.S. NHANES 1999-2018.","authors":"Qun Tang, Yong Wang, Yan Luo","doi":"10.1186/s12911-025-02937-5","DOIUrl":null,"url":null,"abstract":"<p><p>Current research on the association between demographic variables and dietary patterns with atherosclerotic cardiovascular disease (ASCVD) is limited in breadth and depth. This study aimed to construct a machine learning (ML) algorithm that can accurately and transparently establish correlations between demographic variables, dietary habits, and ASCVD. The dataset used in this research originates from the United States National Health and Nutrition Examination Survey (U.S. NHANES) spanning 1999-2018. Five ML models were developed to predict ASCVD, and the best-performing model was selected for further analysis. The study included 40,298 participants. Using 20 population characteristics, the eXtreme Gradient Boosting (XGBoost) model demonstrated high performance, achieving an area under the curve value of 0.8143 and an accuracy of 88.4%. The model showed a positive correlation between male sex and ASCVD risk, while age and smoking also exhibited positive associations with ASCVD risk. Dairy product intake exhibited a negative correlation, while a lower intake of refined grains did not reduce the risk of ASCVD. Additionally, the poverty income ratio and calorie intake exhibited non-linear associations with the disease. The XGBoost model demonstrated significant efficacy, and precision in determining the relationship between the demographic characteristics and dietary intake of participants in the U.S. NHANES 1999-2018 dataset and ASCVD.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"105"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874124/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02937-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Current research on the association between demographic variables and dietary patterns with atherosclerotic cardiovascular disease (ASCVD) is limited in breadth and depth. This study aimed to construct a machine learning (ML) algorithm that can accurately and transparently establish correlations between demographic variables, dietary habits, and ASCVD. The dataset used in this research originates from the United States National Health and Nutrition Examination Survey (U.S. NHANES) spanning 1999-2018. Five ML models were developed to predict ASCVD, and the best-performing model was selected for further analysis. The study included 40,298 participants. Using 20 population characteristics, the eXtreme Gradient Boosting (XGBoost) model demonstrated high performance, achieving an area under the curve value of 0.8143 and an accuracy of 88.4%. The model showed a positive correlation between male sex and ASCVD risk, while age and smoking also exhibited positive associations with ASCVD risk. Dairy product intake exhibited a negative correlation, while a lower intake of refined grains did not reduce the risk of ASCVD. Additionally, the poverty income ratio and calorie intake exhibited non-linear associations with the disease. The XGBoost model demonstrated significant efficacy, and precision in determining the relationship between the demographic characteristics and dietary intake of participants in the U.S. NHANES 1999-2018 dataset and ASCVD.

查看原文本刊更多论文

用于ASCVD鉴定的具有人口统计学变量和饮食模式的可解释机器学习模型：来自美国NHANES 1999-2018。

目前关于人口统计学变量和饮食模式与动脉粥样硬化性心血管疾病（ASCVD）之间关系的研究在广度和深度上都是有限的。本研究旨在构建一种机器学习（ML）算法，该算法可以准确、透明地建立人口统计变量、饮食习惯和ASCVD之间的相关性。本研究中使用的数据集来自1999年至2018年的美国国家健康与营养检查调查（U.S. NHANES）。我们开发了5个ML模型来预测ASCVD，并选择了表现最好的模型进行进一步分析。该研究包括40,298名参与者。使用20个种群特征，eXtreme Gradient Boosting （XGBoost）模型表现出较高的性能，曲线下面积为0.8143，准确率为88.4%。该模型显示，男性性别与ASCVD风险呈正相关，而年龄和吸烟也与ASCVD风险呈正相关。乳制品摄入量呈负相关，而较低的精制谷物摄入量并没有降低ASCVD的风险。此外，贫困收入比和卡路里摄入量与疾病呈非线性相关。XGBoost模型在确定美国NHANES 1999-2018数据集和ASCVD中参与者的人口统计学特征和饮食摄入量之间的关系方面显示出显著的有效性和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.