NHANES 2017-2020.3中基于机器学习算法的NAFLD新诊断预测模型的开发和验证。

IF 2.4 4区 医学 Q3 ENDOCRINOLOGY & METABOLISM
Yazhi Wang, Peng Wang
{"title":"NHANES 2017-2020.3中基于机器学习算法的NAFLD新诊断预测模型的开发和验证。","authors":"Yazhi Wang, Peng Wang","doi":"10.1007/s42000-025-00634-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.</p><p><strong>Methods: </strong>Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).</p><p><strong>Results: </strong>Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.</p><p><strong>Conclusions: </strong>The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.</p>","PeriodicalId":50399,"journal":{"name":"Hormones-International Journal of Endocrinology and Metabolism","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of a new diagnostic prediction model for NAFLD based on machine learning algorithms in NHANES 2017-2020.3.\",\"authors\":\"Yazhi Wang, Peng Wang\",\"doi\":\"10.1007/s42000-025-00634-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aims: </strong>Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.</p><p><strong>Methods: </strong>Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).</p><p><strong>Results: </strong>Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.</p><p><strong>Conclusions: </strong>The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.</p>\",\"PeriodicalId\":50399,\"journal\":{\"name\":\"Hormones-International Journal of Endocrinology and Metabolism\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Hormones-International Journal of Endocrinology and Metabolism\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s42000-025-00634-6\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hormones-International Journal of Endocrinology and Metabolism","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s42000-025-00634-6","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

摘要

目的:非酒精性脂肪性肝病(NAFLD)是一种可引发代谢综合征的多系统疾病。NAFLD的早期预防和治疗对患者和临床医生来说仍然是一个巨大的挑战。本研究的目的是开发和验证基于机器学习(ML)的预测模型。性能最优的模型将被开发为一套简单的算法工具,用于预测NAFLD的个体风险。方法:从国家健康与营养检查调查(NHANES, cycle 2017-2020.3)数据库中提取2428名个体进行统计分析。通过最小绝对收缩和选择算子(LASSO)回归选择特征变量。采用逻辑回归(LR)、决策树(DT)、随机森林(RF)、极端梯度增强(XGB)、k近邻(KNN)、轻梯度增强机(LightGBM)和多层感知器(MLP)等7种机器学习算法,构建基于特征变量的模型,并对其性能进行评价。将表现最佳的模型转化为诊断预测nomogram (DPN)。DPN被开发成在线计算器和Excel算法工具。采用受试者工作特征(ROC)曲线、决策曲线分析(DCA)和亚组分析对DPN与ZJU指数、肝脂肪变性指数(HSI)、甘油三酯-葡萄糖指数(TyG)、Framingham脂肪变性指数(FSI)、脂肪肝指数(FLI)、内脏脂肪变性指数(VAI) 6种现有NAFLD预测模型的预测能力进行比较和评估。结果:2428名参与者中,NAFLD患病率为47.45%。LASSO回归从39个变量中确定了8个变量,包括体重指数(BMI)、腰围(WC)、丙氨酸转氨酶(ALT)、甘油三酯(TG)、糖尿病、高血压、尿酸(UA)和种族。在上述7种算法构建的模型中,基于lr的模型表现最好,在曲线下面积(AUC, 0.823)、准确度(0.754)、精密度(0.768)、特异性(0.804)和阳性预测值(0.768)方面表现突出。然后将其转化为DPN,并成功开发为在线计算器和Excel算法工具。训练集和验证集DPN的诊断准确率(AUC 0.856, 95%可信区间(CI) 0.839 ~ 0.874, AUC 0.823, 95% CI 0.793 ~ 0.854)和净临床效益均优于ZJU、HSI、TyG、FSI、FLI和VAI。结果在亚组分析中保持不变。结论:建立了基于ML的LR模型,具有良好的性能。DPN可作为快速检测NAFLD的个体化工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development and validation of a new diagnostic prediction model for NAFLD based on machine learning algorithms in NHANES 2017-2020.3.

Aims: Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.

Methods: Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).

Results: Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.

Conclusions: The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
76
审稿时长
6-12 weeks
期刊介绍: Hormones-International Journal of Endocrinology and Metabolism is an international journal published quarterly with an international editorial board aiming at providing a forum covering all fields of endocrinology and metabolic disorders such as disruption of glucose homeostasis (diabetes mellitus), impaired homeostasis of plasma lipids (dyslipidemia), the disorder of bone metabolism (osteoporosis), disturbances of endocrine function and reproductive capacity of women and men. Hormones-International Journal of Endocrinology and Metabolism particularly encourages clinical, translational and basic science submissions in the areas of endocrine cancers, nutrition, obesity and metabolic disorders, quality of life of endocrine diseases, epidemiology of endocrine and metabolic disorders.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信