Interpretable machine learning models based on body composition and inflammatory nutritional index (BCINI) to predict early postoperative recurrence of colorectal cancer: Multi-center study
IF 4.8 2区 医学Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yongjie Zhou , Jinhong Zhao , Fei Zou , Yongming Tan , Wei Zeng , Jiahui Jiang , Jiale Hu , Qiao Zeng , Lianggeng Gong , Lan Liu , Linhua Zhong
{"title":"Interpretable machine learning models based on body composition and inflammatory nutritional index (BCINI) to predict early postoperative recurrence of colorectal cancer: Multi-center study","authors":"Yongjie Zhou , Jinhong Zhao , Fei Zou , Yongming Tan , Wei Zeng , Jiahui Jiang , Jiale Hu , Qiao Zeng , Lianggeng Gong , Lan Liu , Linhua Zhong","doi":"10.1016/j.cmpb.2025.108874","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Colorectal cancer (CRC) ranks among the most prevalent cancers worldwide, with early postoperative recurrence remaining a major cause of mortality. Body composition and inflammatory-nutritional indices (BCINI) have demonstrated potential in reflecting patients’ physiological states; however, their association with early recurrence (ER) after CRC resection remains unclear. This study aimed to establish and validate interpretable machine learning (ML) models based on BCINI to predict ER after CRC resection.</div></div><div><h3>Methods</h3><div>Data from three hospitals were collected, including CT-based body composition metrics and blood test variables. After variable selection, six ML algorithms—XGBoost, Complement Naive Bayes (CNB), support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), and Gaussian Naive Bayes (GNB)—were used to construct ER prediction models. Optimal model selection was based on receiver operating characteristic (ROC) curve analysis. The selected model was externally validated using independent datasets to assess generalizability, while its accuracy and clinical utility were evaluated via calibration curves and decision curve analysis. Additionally, SHapley Additive exPlanations were employed to visualize prediction processes for clinical interpretability.</div></div><div><h3>Results</h3><div>The XGBoost algorithm outperformed other methods in model selection, demonstrating superior accuracy and stability with area under the ROC curve (AUC) values of 0.837 and 0.777 in internal training and validation sets, respectively. This model achieved the lowest Brier score of 0.131 on calibration curves, surpassing the five other ML algorithms. External validation further confirmed its generalizability, yielding AUC values of 0.783 and 0.773 in two independent datasets. Consistent predictive performance was observed across age subgroups (<60 years: AUC 0.762–0.834; ≥60 years: AUC 0.777–0.800) and tumor location subgroups (colon: AUC 0.785–0.845; rectum: AUC 0.751–0.799).</div></div><div><h3>Conclusions</h3><div>The interpretable ML model developed based on BCINI shows promise in predicting ER of CRC. This approach may provide valuable insights for clinical decision-making, enabling early detection and intervention to improve patient outcomes.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"269 ","pages":"Article 108874"},"PeriodicalIF":4.8000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002913","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objective
Colorectal cancer (CRC) ranks among the most prevalent cancers worldwide, with early postoperative recurrence remaining a major cause of mortality. Body composition and inflammatory-nutritional indices (BCINI) have demonstrated potential in reflecting patients’ physiological states; however, their association with early recurrence (ER) after CRC resection remains unclear. This study aimed to establish and validate interpretable machine learning (ML) models based on BCINI to predict ER after CRC resection.
Methods
Data from three hospitals were collected, including CT-based body composition metrics and blood test variables. After variable selection, six ML algorithms—XGBoost, Complement Naive Bayes (CNB), support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), and Gaussian Naive Bayes (GNB)—were used to construct ER prediction models. Optimal model selection was based on receiver operating characteristic (ROC) curve analysis. The selected model was externally validated using independent datasets to assess generalizability, while its accuracy and clinical utility were evaluated via calibration curves and decision curve analysis. Additionally, SHapley Additive exPlanations were employed to visualize prediction processes for clinical interpretability.
Results
The XGBoost algorithm outperformed other methods in model selection, demonstrating superior accuracy and stability with area under the ROC curve (AUC) values of 0.837 and 0.777 in internal training and validation sets, respectively. This model achieved the lowest Brier score of 0.131 on calibration curves, surpassing the five other ML algorithms. External validation further confirmed its generalizability, yielding AUC values of 0.783 and 0.773 in two independent datasets. Consistent predictive performance was observed across age subgroups (<60 years: AUC 0.762–0.834; ≥60 years: AUC 0.777–0.800) and tumor location subgroups (colon: AUC 0.785–0.845; rectum: AUC 0.751–0.799).
Conclusions
The interpretable ML model developed based on BCINI shows promise in predicting ER of CRC. This approach may provide valuable insights for clinical decision-making, enabling early detection and intervention to improve patient outcomes.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.