Development and validation of a machine learning-based clinical prediction model for monitoring liver injury in patients with pan-cancer receiving immunotherapy
IF 4.1 2区 医学Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yi Wang , Jing Lei , Zhiping Jin , Ying Jiang , Ningping Zhang , Minzhi Lv , Tianshu Liu
{"title":"Development and validation of a machine learning-based clinical prediction model for monitoring liver injury in patients with pan-cancer receiving immunotherapy","authors":"Yi Wang , Jing Lei , Zhiping Jin , Ying Jiang , Ningping Zhang , Minzhi Lv , Tianshu Liu","doi":"10.1016/j.ijmedinf.2025.106036","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Immune checkpoint inhibitor (ICI)-related liver injury poses a considerable clinical challenge for cancer patients. This study aimed to develop and validate an interpretable predictive model employing machine learning (ML) algorithms to accurately identify patients at high risk of acute liver injury within one month of initiating ICI treatment.</div></div><div><h3>Methods</h3><div>This longitudinal cohort study included pan-cancer patients who received their first ICI treatment between March 2019 and September 2022 at Zhongshan Hospital. Six ML algorithms, namely neural networks (NN), gradient boosting classifier (GBC), eXtreme gradient boosting (XGBoost), logistic regression (LR), categorical boosting classifier (CatBoost) and random forest (RF), were utilized to construct predictive models for acute ICI-related liver injury. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and brier score (BS). The SHapley Additive exPlanations (SHAP) method was applied to rank the feature importance and interpret the final model, providing insights into the contribution of each feature to liver injury prediction, thereby enhancing clinical interpretability. This study is registered with the Chinese Clinical Trial Registry (ChiCTR2300067470).</div></div><div><h3>Results</h3><div>A total of 863 patients were enrolled in the study, with 22.71% experiencing liver injury within one month of ICI initiation. Among the six preliminary models, the RF model exhibited the best performance and was selected for the development of the final model. The SHAP method was utilized to rank variables from the six pre-models, with 10 variables selected for the final model by identifying the intersection of the top 20 most important variables across these models. The final RF model exhibited robust performance, achieving an AUC of 0.81 (95% CI: 0.73–0.90) on the test set, and 0.79 (95% CI: 0.72–0.88) and 0.80 (95% CI: 0.72–0.89) in the 5-fold and 10-fold cross-validation, respectively. The Decision Curve Analysis (DCA) curve illustrated solid clinical benefit, and the calibration curve reflected good predictive consistency.</div></div><div><h3>Conclusion</h3><div>An interpretable RF model was developed to predict acute liver injury occurring within one month after ICI treatment. This clinical-friendly model enables early identification of high-risk patients, facilitating optimized clinical management and ultimately improving treatment outcomes.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"203 ","pages":"Article 106036"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002539","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Immune checkpoint inhibitor (ICI)-related liver injury poses a considerable clinical challenge for cancer patients. This study aimed to develop and validate an interpretable predictive model employing machine learning (ML) algorithms to accurately identify patients at high risk of acute liver injury within one month of initiating ICI treatment.
Methods
This longitudinal cohort study included pan-cancer patients who received their first ICI treatment between March 2019 and September 2022 at Zhongshan Hospital. Six ML algorithms, namely neural networks (NN), gradient boosting classifier (GBC), eXtreme gradient boosting (XGBoost), logistic regression (LR), categorical boosting classifier (CatBoost) and random forest (RF), were utilized to construct predictive models for acute ICI-related liver injury. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and brier score (BS). The SHapley Additive exPlanations (SHAP) method was applied to rank the feature importance and interpret the final model, providing insights into the contribution of each feature to liver injury prediction, thereby enhancing clinical interpretability. This study is registered with the Chinese Clinical Trial Registry (ChiCTR2300067470).
Results
A total of 863 patients were enrolled in the study, with 22.71% experiencing liver injury within one month of ICI initiation. Among the six preliminary models, the RF model exhibited the best performance and was selected for the development of the final model. The SHAP method was utilized to rank variables from the six pre-models, with 10 variables selected for the final model by identifying the intersection of the top 20 most important variables across these models. The final RF model exhibited robust performance, achieving an AUC of 0.81 (95% CI: 0.73–0.90) on the test set, and 0.79 (95% CI: 0.72–0.88) and 0.80 (95% CI: 0.72–0.89) in the 5-fold and 10-fold cross-validation, respectively. The Decision Curve Analysis (DCA) curve illustrated solid clinical benefit, and the calibration curve reflected good predictive consistency.
Conclusion
An interpretable RF model was developed to predict acute liver injury occurring within one month after ICI treatment. This clinical-friendly model enables early identification of high-risk patients, facilitating optimized clinical management and ultimately improving treatment outcomes.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.