Machine learning models for predicting metabolic dysfunction-associated steatotic liver disease prevalence using basic demographic and clinical characteristics.

IF 6.1 2区医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Journal of Translational Medicine Pub Date : 2025-03-28 DOI:10.1186/s12967-025-06387-5

Gangfeng Zhu, Yipeng Song, Zenghong Lu, Qiang Yi, Rui Xu, Yi Xie, Shi Geng, Na Yang, Liangjian Zheng, Xiaofei Feng, Rui Zhu, Xiangcai Wang, Li Huang, Yi Xiang

{"title":"Machine learning models for predicting metabolic dysfunction-associated steatotic liver disease prevalence using basic demographic and clinical characteristics.","authors":"Gangfeng Zhu, Yipeng Song, Zenghong Lu, Qiang Yi, Rui Xu, Yi Xie, Shi Geng, Na Yang, Liangjian Zheng, Xiaofei Feng, Rui Zhu, Xiangcai Wang, Li Huang, Yi Xiang","doi":"10.1186/s12967-025-06387-5","DOIUrl":null,"url":null,"abstract":"Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is a global health concern that necessitates early screening and timely intervention to improve prognosis. The current diagnostic protocols for MASLD involve complex procedures in specialised medical centres. This study aimed to explore the feasibility of utilising machine learning models to accurately screen for MASLD in large populations based on a combination of essential demographic and clinical characteristics.Methods: A total of 10,007 outpatients who underwent transient elastography at the First Affiliated Hospital of Gannan Medical University were enrolled to form a derivation cohort. Using eight demographic and clinical characteristics (age, educational level, height, weight, waist and hip circumference, and history of hypertension and diabetes), we built predictive models for MASLD (classified as none or mild: controlled attenuation parameter (CAP) ≤ 269 dB/m; moderate: 269-296 dB/m; severe: CAP > 296 dB/m) employing 10 machine learning algorithms: logistic regression (LR), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), bootstrap aggregating, decision tree, K-nearest neighbours, light gradient boosting machine, naive Bayes, random forest, and support vector machine. These models were externally validated using the National Health and Nutrition Examination Survey (NHANES) 2017-2023 datasets.Results: In the hospital outpatient cohort, machine learning algorithms demonstrated robust predictive capabilities. Notably, LR achieved the highest accuracy (ACC) of 0.711 in the test cohort and 0.728 in the validation cohort, coupled with robust areas under the receiver operating characteristic curve (AUC) values of 0.798 and 0.806, respectively. Similarly, MLP and XGBoost showed promising results, with MLP achieving an ACC of 0.735 in the test cohort, and XGBoost registering an AUC of 0.798. External validation using the NHANES datasets yielded consistent AUC results, with LR (0.831), MLP (0.823), and XGBoost (0.784) performing robustly.Conclusions: This study demonstrated that machine learning models constructed using a combination of essential demographic and clinical characteristics can accurately screen for MASLD in the general population. This approach significantly enhances the feasibility, accessibility, and compliance of MASLD screening and provides an effective tool for large-scale health assessments and early intervention strategies.","PeriodicalId":17458,"journal":{"name":"Journal of Translational Medicine","volume":"23 1","pages":"381"},"PeriodicalIF":6.1000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951774/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12967-025-06387-5","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is a global health concern that necessitates early screening and timely intervention to improve prognosis. The current diagnostic protocols for MASLD involve complex procedures in specialised medical centres. This study aimed to explore the feasibility of utilising machine learning models to accurately screen for MASLD in large populations based on a combination of essential demographic and clinical characteristics.

Methods: A total of 10,007 outpatients who underwent transient elastography at the First Affiliated Hospital of Gannan Medical University were enrolled to form a derivation cohort. Using eight demographic and clinical characteristics (age, educational level, height, weight, waist and hip circumference, and history of hypertension and diabetes), we built predictive models for MASLD (classified as none or mild: controlled attenuation parameter (CAP) ≤ 269 dB/m; moderate: 269-296 dB/m; severe: CAP > 296 dB/m) employing 10 machine learning algorithms: logistic regression (LR), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), bootstrap aggregating, decision tree, K-nearest neighbours, light gradient boosting machine, naive Bayes, random forest, and support vector machine. These models were externally validated using the National Health and Nutrition Examination Survey (NHANES) 2017-2023 datasets.

Results: In the hospital outpatient cohort, machine learning algorithms demonstrated robust predictive capabilities. Notably, LR achieved the highest accuracy (ACC) of 0.711 in the test cohort and 0.728 in the validation cohort, coupled with robust areas under the receiver operating characteristic curve (AUC) values of 0.798 and 0.806, respectively. Similarly, MLP and XGBoost showed promising results, with MLP achieving an ACC of 0.735 in the test cohort, and XGBoost registering an AUC of 0.798. External validation using the NHANES datasets yielded consistent AUC results, with LR (0.831), MLP (0.823), and XGBoost (0.784) performing robustly.

Conclusions: This study demonstrated that machine learning models constructed using a combination of essential demographic and clinical characteristics can accurately screen for MASLD in the general population. This approach significantly enhances the feasibility, accessibility, and compliance of MASLD screening and provides an effective tool for large-scale health assessments and early intervention strategies.

查看原文本刊更多论文

利用基本人口学和临床特征预测代谢功能障碍相关脂肪变性肝病患病率的机器学习模型

背景：代谢功能障碍相关脂肪变性肝病（MASLD）是一个全球性的健康问题，需要早期筛查和及时干预以改善预后。目前对MASLD的诊断规程涉及在专门医疗中心进行的复杂程序。本研究旨在探索基于基本人口统计学和临床特征的组合，利用机器学习模型在大量人群中准确筛查MASLD的可行性。方法：选取甘南医科大学第一附属医院门诊行瞬态弹性成像的1007例患者，形成衍生队列。利用8个人口学和临床特征（年龄、受教育程度、身高、体重、腰围和臀围、高血压和糖尿病史），我们建立了MASLD的预测模型(分为无或轻度：控制衰减参数(CAP)≤269 dB/m；中等：269- 296db /m；重度：CAP > 296 dB/m)，采用10种机器学习算法：逻辑回归（LR）、多层感知器（MLP）、极端梯度增强（XGBoost）、自举聚合、决策树、k近邻、轻梯度增强机、朴素贝叶斯、随机森林和支持向量机。这些模型使用2017-2023年国家健康与营养检查调查（NHANES）数据集进行外部验证。结果：在医院门诊队列中，机器学习算法显示出强大的预测能力。值得注意的是，LR在测试队列和验证队列中分别获得了0.711和0.728的最高准确度（ACC），同时受试者工作特征曲线（AUC）下的稳健区分别为0.798和0.806。同样，MLP和XGBoost也显示出令人鼓舞的结果，MLP在测试队列中的ACC为0.735，XGBoost的AUC为0.798。使用NHANES数据集进行外部验证得到一致的AUC结果，LR（0.831）、MLP（0.823）和XGBoost（0.784）表现稳健。结论：本研究表明，结合基本人口学特征和临床特征构建的机器学习模型可以准确地筛查普通人群中的MASLD。该方法显著提高了MASLD筛查的可行性、可及性和依从性，并为大规模健康评估和早期干预策略提供了有效工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Translational Medicine 医学-医学：研究与实验

CiteScore

10.00

自引率

1.40%

发文量

537

审稿时长

1 months

期刊介绍： The Journal of Translational Medicine is an open-access journal that publishes articles focusing on information derived from human experimentation to enhance communication between basic and clinical science. It covers all areas of translational medicine.