Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2025-09-24 DOI:10.1186/s13040-025-00479-0

Fang Yang, Jia Li, Ziyi Yang, Liping Wu, Han Wang, Chao Sun

{"title":"Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study.","authors":"Fang Yang, Jia Li, Ziyi Yang, Liping Wu, Han Wang, Chao Sun","doi":"10.1186/s13040-025-00479-0","DOIUrl":null,"url":null,"abstract":"Background: Early 30-day readmission remains a significant burden on the socioeconomic and healthcare system in the context of decompensated cirrhosis. Early recognition and accurate identification are crucial. However, current evidence is elusive and traditional scores concerning liver disease severity are lacking specificity and sensitivity. We sought to construct and validate an explainable machine learning (ML)-based prediction model, and evaluate its prognostic implementation in patients readmitted due to acute episodes. The prediction model for discovery and validation was based on a two-center prospective investigation. Our discovery sample, comprising 636 patients with cirrhosis, was divided into a training set and a test set, with an additional cohort of 150 patients serving as an external validation. Eleven ML methods were performed to establish an indicative model based on a variety of easily accessible and obtainable variables from the electronic health record. The area under the ROC curve (AUC), alongside several evaluation parameters, was used for comparison regarding predictive performance. Considering feature importance and final model explanation, we adopted the SHapley Additive exPlanation method for ranking. Furthermore, prognostic implementation was verified by subgrouping according to the final model and clinical outcomes during follow-up.Results: Among all 11 ML algorithms, the random forest (RF) algorithm represented the best discriminatory capability. Processing feature reduction generated a final 7-feature RF model with explainability based on the importance ranking. Our constructed model was of moderately accurate prediction pertaining to internal and external validations, with respective AUCs of 0.853 and 0.838, which was further transformed into an online tool to facilitate daily practice. Patients positively adjudged by the prediction model had aggravated underlying disease severity and poor psychophysiologic reservation.Conclusions: The final explainable ML model was capable of predicting early readmission and was closely connected with adverse outcomes in individual patients experiencing decompensated cirrhosis. Notably, it allayed the \"black-box\" concerns inherent to ML techniques with an indirect interpretation.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"63"},"PeriodicalIF":6.1000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462353/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00479-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Early 30-day readmission remains a significant burden on the socioeconomic and healthcare system in the context of decompensated cirrhosis. Early recognition and accurate identification are crucial. However, current evidence is elusive and traditional scores concerning liver disease severity are lacking specificity and sensitivity. We sought to construct and validate an explainable machine learning (ML)-based prediction model, and evaluate its prognostic implementation in patients readmitted due to acute episodes. The prediction model for discovery and validation was based on a two-center prospective investigation. Our discovery sample, comprising 636 patients with cirrhosis, was divided into a training set and a test set, with an additional cohort of 150 patients serving as an external validation. Eleven ML methods were performed to establish an indicative model based on a variety of easily accessible and obtainable variables from the electronic health record. The area under the ROC curve (AUC), alongside several evaluation parameters, was used for comparison regarding predictive performance. Considering feature importance and final model explanation, we adopted the SHapley Additive exPlanation method for ranking. Furthermore, prognostic implementation was verified by subgrouping according to the final model and clinical outcomes during follow-up.

Results: Among all 11 ML algorithms, the random forest (RF) algorithm represented the best discriminatory capability. Processing feature reduction generated a final 7-feature RF model with explainability based on the importance ranking. Our constructed model was of moderately accurate prediction pertaining to internal and external validations, with respective AUCs of 0.853 and 0.838, which was further transformed into an online tool to facilitate daily practice. Patients positively adjudged by the prediction model had aggravated underlying disease severity and poor psychophysiologic reservation.

Conclusions: The final explainable ML model was capable of predicting early readmission and was closely connected with adverse outcomes in individual patients experiencing decompensated cirrhosis. Notably, it allayed the "black-box" concerns inherent to ML techniques with an indirect interpretation.

查看原文本刊更多论文

基于机器学习的预测失代偿肝硬化患者早期再入院模型的构建和验证：一项前瞻性双中心队列研究。

背景：在失代偿肝硬化的背景下，早期30天再入院仍然是社会经济和医疗保健系统的一个重大负担。早期识别和准确识别是至关重要的。然而，目前的证据是难以捉摸的，传统的肝病严重程度评分缺乏特异性和敏感性。我们试图构建并验证一个可解释的基于机器学习（ML）的预测模型，并评估其在急性发作再入院患者中的预后实施情况。发现和验证的预测模型是基于双中心前瞻性调查。我们的发现样本包括636名肝硬化患者，分为训练集和测试集，另外还有150名患者作为外部验证。采用11种ML方法，根据电子健康记录中各种易于获取和获取的变量建立指示性模型。ROC曲线下面积（AUC）与几个评估参数一起用于比较预测性能。考虑到特征的重要性和最终的模型解释，我们采用SHapley加性解释法进行排序。此外，根据最终模型和随访期间的临床结果进行亚分组，验证预后的实现情况。结果：在11种ML算法中，随机森林（RF）算法具有最好的区分能力。处理特征约简生成了最终的7个特征的RF模型，该模型基于重要性排序具有可解释性。我们构建的模型在内部验证和外部验证中具有中等精度的预测，auc分别为0.853和0.838，进一步转化为在线工具，方便日常实践。预测模型阳性的患者基础疾病严重程度加重，心理生理保留差。结论：最终可解释的ML模型能够预测早期再入院，并与失代偿性肝硬化个体患者的不良结局密切相关。值得注意的是，它通过间接解释减轻了ML技术固有的“黑箱”问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.