Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study.
IF 6.1 3区 生物学Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Fang Yang, Jia Li, Ziyi Yang, Liping Wu, Han Wang, Chao Sun
{"title":"Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study.","authors":"Fang Yang, Jia Li, Ziyi Yang, Liping Wu, Han Wang, Chao Sun","doi":"10.1186/s13040-025-00479-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early 30-day readmission remains a significant burden on the socioeconomic and healthcare system in the context of decompensated cirrhosis. Early recognition and accurate identification are crucial. However, current evidence is elusive and traditional scores concerning liver disease severity are lacking specificity and sensitivity. We sought to construct and validate an explainable machine learning (ML)-based prediction model, and evaluate its prognostic implementation in patients readmitted due to acute episodes. The prediction model for discovery and validation was based on a two-center prospective investigation. Our discovery sample, comprising 636 patients with cirrhosis, was divided into a training set and a test set, with an additional cohort of 150 patients serving as an external validation. Eleven ML methods were performed to establish an indicative model based on a variety of easily accessible and obtainable variables from the electronic health record. The area under the ROC curve (AUC), alongside several evaluation parameters, was used for comparison regarding predictive performance. Considering feature importance and final model explanation, we adopted the SHapley Additive exPlanation method for ranking. Furthermore, prognostic implementation was verified by subgrouping according to the final model and clinical outcomes during follow-up.</p><p><strong>Results: </strong>Among all 11 ML algorithms, the random forest (RF) algorithm represented the best discriminatory capability. Processing feature reduction generated a final 7-feature RF model with explainability based on the importance ranking. Our constructed model was of moderately accurate prediction pertaining to internal and external validations, with respective AUCs of 0.853 and 0.838, which was further transformed into an online tool to facilitate daily practice. Patients positively adjudged by the prediction model had aggravated underlying disease severity and poor psychophysiologic reservation.</p><p><strong>Conclusions: </strong>The final explainable ML model was capable of predicting early readmission and was closely connected with adverse outcomes in individual patients experiencing decompensated cirrhosis. Notably, it allayed the \"black-box\" concerns inherent to ML techniques with an indirect interpretation.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"63"},"PeriodicalIF":6.1000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462353/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00479-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Early 30-day readmission remains a significant burden on the socioeconomic and healthcare system in the context of decompensated cirrhosis. Early recognition and accurate identification are crucial. However, current evidence is elusive and traditional scores concerning liver disease severity are lacking specificity and sensitivity. We sought to construct and validate an explainable machine learning (ML)-based prediction model, and evaluate its prognostic implementation in patients readmitted due to acute episodes. The prediction model for discovery and validation was based on a two-center prospective investigation. Our discovery sample, comprising 636 patients with cirrhosis, was divided into a training set and a test set, with an additional cohort of 150 patients serving as an external validation. Eleven ML methods were performed to establish an indicative model based on a variety of easily accessible and obtainable variables from the electronic health record. The area under the ROC curve (AUC), alongside several evaluation parameters, was used for comparison regarding predictive performance. Considering feature importance and final model explanation, we adopted the SHapley Additive exPlanation method for ranking. Furthermore, prognostic implementation was verified by subgrouping according to the final model and clinical outcomes during follow-up.
Results: Among all 11 ML algorithms, the random forest (RF) algorithm represented the best discriminatory capability. Processing feature reduction generated a final 7-feature RF model with explainability based on the importance ranking. Our constructed model was of moderately accurate prediction pertaining to internal and external validations, with respective AUCs of 0.853 and 0.838, which was further transformed into an online tool to facilitate daily practice. Patients positively adjudged by the prediction model had aggravated underlying disease severity and poor psychophysiologic reservation.
Conclusions: The final explainable ML model was capable of predicting early readmission and was closely connected with adverse outcomes in individual patients experiencing decompensated cirrhosis. Notably, it allayed the "black-box" concerns inherent to ML techniques with an indirect interpretation.
期刊介绍:
BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data.
Topical areas include, but are not limited to:
-Development, evaluation, and application of novel data mining and machine learning algorithms.
-Adaptation, evaluation, and application of traditional data mining and machine learning algorithms.
-Open-source software for the application of data mining and machine learning algorithms.
-Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies.
-Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.