Pinchu Chen, Yao Li, Chenfenglin Yang, Qifan Zhang
{"title":"Machine learning models integrating dietary data predict all-cause mortality in U.S. NAFLD patients: an NHANES-based study.","authors":"Pinchu Chen, Yao Li, Chenfenglin Yang, Qifan Zhang","doi":"10.1186/s12937-025-01170-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incorporate dietary factors. This study aims to integrate demographic, serological, and nutritional data. It focuses on developing machine learning models that predict all-cause mortality risk in NAFLD patients, with a particular emphasis on dietary interventions.</p><p><strong>Methods: </strong>Data from the National Health and Nutrition Examination Survey (NHANES) 2007-2018, comprising 2,589 NAFLD participants, were analyzed. Variables associated with survival outcomes were selected using LASSO-Cox regression. Five machine learning models-Random Survival Forest (RSF), Gradient Boosting Machine (GBM), CoxBoost, and Survival Support Vector Machine (SurvivalSVM), eXtreme Gradient Boosting (XGBoost) -were developed and their performance evaluated through time-dependent AUC, ROC curves, C-index, Brier score and Kaplan-Meier analysis. SHAP values were employed for model interpretability.</p><p><strong>Results: </strong>LASSO-Cox regression identified 13 significant variables, including age, household income, blood glucose, sedentary behavior, dietary fiber intake and so on. The GBM and RSF models demonstrated strong predictive performance with AUC values around 0.8 for both 5- and 10-year survival predictions, and also performed well in terms of C-index and Brier score. SHAP analysis revealed that advanced age, low household income, hyperglycemia, and sedentary behavior were associated with poor prognosis, whereas higher dietary fiber intake was linked to improved survival.</p><p><strong>Conclusions: </strong>This study integrates dietary data into machine learning models, demonstrating the potential for predicting all-cause mortality in NAFLD patients. The models, particularly RSF and GBM, show robust predictive accuracy, with dietary fiber intake consistently exhibiting a protective effect on survival outcomes. These findings suggest that dietary interventions, such as increasing dietary fiber intake, could improve the long-term prognosis of NAFLD patients.</p><p><strong>Clinical trial number: </strong>Not applicable.</p>","PeriodicalId":19203,"journal":{"name":"Nutrition Journal","volume":"24 1","pages":"100"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220616/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nutrition Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12937-025-01170-0","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Non-alcoholic fatty liver disease (NAFLD) is a leading cause of chronic liver disease, closely associated with metabolic abnormalities and unhealthy lifestyle habits. Despite the critical role of diet in disease progression, most existing prognostic models for NAFLD fail to incorporate dietary factors. This study aims to integrate demographic, serological, and nutritional data. It focuses on developing machine learning models that predict all-cause mortality risk in NAFLD patients, with a particular emphasis on dietary interventions.
Methods: Data from the National Health and Nutrition Examination Survey (NHANES) 2007-2018, comprising 2,589 NAFLD participants, were analyzed. Variables associated with survival outcomes were selected using LASSO-Cox regression. Five machine learning models-Random Survival Forest (RSF), Gradient Boosting Machine (GBM), CoxBoost, and Survival Support Vector Machine (SurvivalSVM), eXtreme Gradient Boosting (XGBoost) -were developed and their performance evaluated through time-dependent AUC, ROC curves, C-index, Brier score and Kaplan-Meier analysis. SHAP values were employed for model interpretability.
Results: LASSO-Cox regression identified 13 significant variables, including age, household income, blood glucose, sedentary behavior, dietary fiber intake and so on. The GBM and RSF models demonstrated strong predictive performance with AUC values around 0.8 for both 5- and 10-year survival predictions, and also performed well in terms of C-index and Brier score. SHAP analysis revealed that advanced age, low household income, hyperglycemia, and sedentary behavior were associated with poor prognosis, whereas higher dietary fiber intake was linked to improved survival.
Conclusions: This study integrates dietary data into machine learning models, demonstrating the potential for predicting all-cause mortality in NAFLD patients. The models, particularly RSF and GBM, show robust predictive accuracy, with dietary fiber intake consistently exhibiting a protective effect on survival outcomes. These findings suggest that dietary interventions, such as increasing dietary fiber intake, could improve the long-term prognosis of NAFLD patients.
期刊介绍:
Nutrition Journal publishes surveillance, epidemiologic, and intervention research that sheds light on i) influences (e.g., familial, environmental) on eating patterns; ii) associations between eating patterns and health, and iii) strategies to improve eating patterns among populations. The journal also welcomes manuscripts reporting on the psychometric properties (e.g., validity, reliability) and feasibility of methods (e.g., for assessing dietary intake) for human nutrition research. In addition, study protocols for controlled trials and cohort studies, with an emphasis on methods for assessing dietary exposures and outcomes as well as intervention components, will be considered.
Manuscripts that consider eating patterns holistically, as opposed to solely reductionist approaches that focus on specific dietary components in isolation, are encouraged. Also encouraged are papers that take a holistic or systems perspective in attempting to understand possible compensatory and differential effects of nutrition interventions. The journal does not consider animal studies.
In addition to the influence of eating patterns for human health, we also invite research providing insights into the environmental sustainability of dietary practices. Again, a holistic perspective is encouraged, for example, through the consideration of how eating patterns might maximize both human and planetary health.