Development and validation of an ultrasound-based interpretable machine learning model for the classification of ≤3 cm hepatocellular carcinoma: a multicentre retrospective diagnostic study.
Zhicheng Du, Fangying Fan, Jun Ma, Jing Liu, Xing Yan, Xuexue Chen, Yangfang Dong, Jiapeng Wu, Wenzhen Ding, Qinxian Zhao, Yuling Wang, Guojun Zhang, Jie Yu, Ping Liang
{"title":"Development and validation of an ultrasound-based interpretable machine learning model for the classification of ≤3 cm hepatocellular carcinoma: a multicentre retrospective diagnostic study.","authors":"Zhicheng Du, Fangying Fan, Jun Ma, Jing Liu, Xing Yan, Xuexue Chen, Yangfang Dong, Jiapeng Wu, Wenzhen Ding, Qinxian Zhao, Yuling Wang, Guojun Zhang, Jie Yu, Ping Liang","doi":"10.1016/j.eclinm.2025.103098","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Our study aimed to develop a machine learning (ML) model utilizing grayscale ultrasound (US) to distinguish ≤3 cm small hepatocellular carcinoma (sHCC) from non-HCC lesions.</p><p><strong>Methods: </strong>A total of 1052 patients with 1058 liver lesions ≤3 cm from 55 hospitals were collected between May 2017 and June 2021, and 756 liver lesions were randomly allocated into train and internal validation cohorts at a 8:2 ratio for the development and evaluation of ML models based on multilayer perceptron (MLP) and extreme gradient boosting (XGBoost) methods (Model<sup>U</sup> utilizing US imaging features; Model<sup>UR</sup> adding US radiomics features; Model<sup>URC</sup> employing clinical features further). The diagnostic performance of three models was assessed in external validation cohort (312 liver lesions from 14 hospitals). The diagnostic efficacy of the optimal model was compared to that of radiologists in external validation cohort. The SHapley Additive exPlanations (SHAP) method was employed to interpret the optimal ML model by ranking feature importance. The study was registered at ClinicalTrials.gov (NCT03871140).</p><p><strong>Findings: </strong>Model<sup>URC</sup> based XGBoost showed the best performance (AUC = 0.934; 95% CI: 0.894-0.974) in the internal validation cohort. In the external validation cohort, Model<sup>URC</sup> also achieved optimal AUC (AUC = 0.899, 95% CI: 0.861-0.931). Upon conducting a subgroup analysis, no statistically significant differences were observed in the diagnostic performance of the Model<sup>URC</sup> neither between tumor sizes of ≤2.0 cm and 2.1-3.0 cm nor across different HCC risk stratifications. Model<sup>URC</sup> exhibited superior ability compared to all radiologists and Model<sup>URC</sup> assistance significantly improved the diagnostic AUC for all radiologists (all P < 0.0001).</p><p><strong>Interpretation: </strong>A diagnostic model for sHCC was developed and validated using ML and grayscale US from large cohorts. This model significantly improved the diagnostic performance of grayscale US for sHCC compared with experts.</p><p><strong>Funding: </strong>This work was supported by National Key Research and Development Program of China (2022YFC2405500), Major Research Program of the National Natural Science Foundation of China (92159305), National Science Fund for Distinguished Young Scholars (82325027), Key project of National Natural Science Foundation of China (82030047), Military Fund for Geriatric Diseases (20BJZ42), National Natural Science Foundation of China Special Program (82441011). National Natural Science Foundation of China (82402280), National Natural Science Foundation of China (32171363), Key Research and Development Program for Social Development of Yunnan Science and Technology Department (202403AC100014).</p>","PeriodicalId":11393,"journal":{"name":"EClinicalMedicine","volume":"81 ","pages":"103098"},"PeriodicalIF":9.6000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11872562/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EClinicalMedicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eclinm.2025.103098","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Our study aimed to develop a machine learning (ML) model utilizing grayscale ultrasound (US) to distinguish ≤3 cm small hepatocellular carcinoma (sHCC) from non-HCC lesions.
Methods: A total of 1052 patients with 1058 liver lesions ≤3 cm from 55 hospitals were collected between May 2017 and June 2021, and 756 liver lesions were randomly allocated into train and internal validation cohorts at a 8:2 ratio for the development and evaluation of ML models based on multilayer perceptron (MLP) and extreme gradient boosting (XGBoost) methods (ModelU utilizing US imaging features; ModelUR adding US radiomics features; ModelURC employing clinical features further). The diagnostic performance of three models was assessed in external validation cohort (312 liver lesions from 14 hospitals). The diagnostic efficacy of the optimal model was compared to that of radiologists in external validation cohort. The SHapley Additive exPlanations (SHAP) method was employed to interpret the optimal ML model by ranking feature importance. The study was registered at ClinicalTrials.gov (NCT03871140).
Findings: ModelURC based XGBoost showed the best performance (AUC = 0.934; 95% CI: 0.894-0.974) in the internal validation cohort. In the external validation cohort, ModelURC also achieved optimal AUC (AUC = 0.899, 95% CI: 0.861-0.931). Upon conducting a subgroup analysis, no statistically significant differences were observed in the diagnostic performance of the ModelURC neither between tumor sizes of ≤2.0 cm and 2.1-3.0 cm nor across different HCC risk stratifications. ModelURC exhibited superior ability compared to all radiologists and ModelURC assistance significantly improved the diagnostic AUC for all radiologists (all P < 0.0001).
Interpretation: A diagnostic model for sHCC was developed and validated using ML and grayscale US from large cohorts. This model significantly improved the diagnostic performance of grayscale US for sHCC compared with experts.
Funding: This work was supported by National Key Research and Development Program of China (2022YFC2405500), Major Research Program of the National Natural Science Foundation of China (92159305), National Science Fund for Distinguished Young Scholars (82325027), Key project of National Natural Science Foundation of China (82030047), Military Fund for Geriatric Diseases (20BJZ42), National Natural Science Foundation of China Special Program (82441011). National Natural Science Foundation of China (82402280), National Natural Science Foundation of China (32171363), Key Research and Development Program for Social Development of Yunnan Science and Technology Department (202403AC100014).
期刊介绍:
eClinicalMedicine is a gold open-access clinical journal designed to support frontline health professionals in addressing the complex and rapid health transitions affecting societies globally. The journal aims to assist practitioners in overcoming healthcare challenges across diverse communities, spanning diagnosis, treatment, prevention, and health promotion. Integrating disciplines from various specialties and life stages, it seeks to enhance health systems as fundamental institutions within societies. With a forward-thinking approach, eClinicalMedicine aims to redefine the future of healthcare.