Sang Won Park , Na Young Yeo , Tae-Hoon Kim , Myoung Nam Lim , Inhyeok Yim , Oh Beom Kwon , Seung-Joo Nam , Hui-Young Lee , Woo Jin Kim
{"title":"Explainable AI for colorectal cancer mortality and risk factor prediction in Korea: A nationwide cancer cohort study","authors":"Sang Won Park , Na Young Yeo , Tae-Hoon Kim , Myoung Nam Lim , Inhyeok Yim , Oh Beom Kwon , Seung-Joo Nam , Hui-Young Lee , Woo Jin Kim","doi":"10.1016/j.ijmedinf.2025.106125","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Colorectal cancer (CRC) prognosis varies significantly, yet conventional statistical models struggle to capture the complex, non-linear interactions among clinical variables. Furthermore, most predictive models are based on Western populations, limiting their applicability to Korean patients. This study aimed to develop an explainable AI (XAI) model for CRC mortality prediction using a nationwide Korean cohort to provide clinically actionable insights.</div></div><div><h3>Methods</h3><div>We conducted a retrospective cohort study using the Korean Cancer Public Library Database. A total of 9,069 patients with CRC were analyzed for all-cause mortality (1,878 deaths) and 8,589 patients for CRC-specific mortality (1,398 deaths). Four ML algorithms—support vector machine, random forest, XGBoost, and LightGBM—were constructed. We employed explainable AI techniques, including SHapley Additive exPlanations (SHAP), to quantify the contribution of each predictor and ensure model interpretability.</div></div><div><h3>Results</h3><div>All models showed good discrimination (AUC: 0.82–0.94). LightGBM was presented as the best-optimized model with an AUC of 0.824 [95% CI 0.80–0.85] in all-cause mortality. For CRC-specific mortality, LGB again yielded the AUC of 0.867 [95% CI 0.84–0.89]. SHAP revealed tumor stage and carcinoembryonic antigen as top mortality predictors across ages. Metabolic markers (e.g., hypertension, cholesterol) and liver enzymes were more predictive in younger patients.</div></div><div><h3>Conclusions</h3><div>We developed the first interpretable machine learning model that accurately predicts CRC survival in a nationwide Korean cohort. Age-specific risk factors identified by SHAP not only support personalized care but also advance the application of precision oncology in Asian settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"205 ","pages":"Article 106125"},"PeriodicalIF":4.1000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625003429","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Colorectal cancer (CRC) prognosis varies significantly, yet conventional statistical models struggle to capture the complex, non-linear interactions among clinical variables. Furthermore, most predictive models are based on Western populations, limiting their applicability to Korean patients. This study aimed to develop an explainable AI (XAI) model for CRC mortality prediction using a nationwide Korean cohort to provide clinically actionable insights.
Methods
We conducted a retrospective cohort study using the Korean Cancer Public Library Database. A total of 9,069 patients with CRC were analyzed for all-cause mortality (1,878 deaths) and 8,589 patients for CRC-specific mortality (1,398 deaths). Four ML algorithms—support vector machine, random forest, XGBoost, and LightGBM—were constructed. We employed explainable AI techniques, including SHapley Additive exPlanations (SHAP), to quantify the contribution of each predictor and ensure model interpretability.
Results
All models showed good discrimination (AUC: 0.82–0.94). LightGBM was presented as the best-optimized model with an AUC of 0.824 [95% CI 0.80–0.85] in all-cause mortality. For CRC-specific mortality, LGB again yielded the AUC of 0.867 [95% CI 0.84–0.89]. SHAP revealed tumor stage and carcinoembryonic antigen as top mortality predictors across ages. Metabolic markers (e.g., hypertension, cholesterol) and liver enzymes were more predictive in younger patients.
Conclusions
We developed the first interpretable machine learning model that accurately predicts CRC survival in a nationwide Korean cohort. Age-specific risk factors identified by SHAP not only support personalized care but also advance the application of precision oncology in Asian settings.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.