可解释的AI对韩国结直肠癌死亡率和危险因素预测的影响:一项全国性的癌症队列研究。

IF 4.1 2区 医学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Sang Won Park , Na Young Yeo , Tae-Hoon Kim , Myoung Nam Lim , Inhyeok Yim , Oh Beom Kwon , Seung-Joo Nam , Hui-Young Lee , Woo Jin Kim
{"title":"可解释的AI对韩国结直肠癌死亡率和危险因素预测的影响:一项全国性的癌症队列研究。","authors":"Sang Won Park ,&nbsp;Na Young Yeo ,&nbsp;Tae-Hoon Kim ,&nbsp;Myoung Nam Lim ,&nbsp;Inhyeok Yim ,&nbsp;Oh Beom Kwon ,&nbsp;Seung-Joo Nam ,&nbsp;Hui-Young Lee ,&nbsp;Woo Jin Kim","doi":"10.1016/j.ijmedinf.2025.106125","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Colorectal cancer (CRC) prognosis varies significantly, yet conventional statistical models struggle to capture the complex, non-linear interactions among clinical variables. Furthermore, most predictive models are based on Western populations, limiting their applicability to Korean patients. This study aimed to develop an explainable AI (XAI) model for CRC mortality prediction using a nationwide Korean cohort to provide clinically actionable insights.</div></div><div><h3>Methods</h3><div>We conducted a retrospective cohort study using the Korean Cancer Public Library Database. A total of 9,069 patients with CRC were analyzed for all-cause mortality (1,878 deaths) and 8,589 patients for CRC-specific mortality (1,398 deaths). Four ML algorithms—support vector machine, random forest, XGBoost, and LightGBM—were constructed. We employed explainable AI techniques, including SHapley Additive exPlanations (SHAP), to quantify the contribution of each predictor and ensure model interpretability.</div></div><div><h3>Results</h3><div>All models showed good discrimination (AUC: 0.82–0.94). LightGBM was presented as the best-optimized model with an AUC of 0.824 [95% CI 0.80–0.85] in all-cause mortality. For CRC-specific mortality, LGB again yielded the AUC of 0.867 [95% CI 0.84–0.89]. SHAP revealed tumor stage and carcinoembryonic antigen as top mortality predictors across ages. Metabolic markers (e.g., hypertension, cholesterol) and liver enzymes were more predictive in younger patients.</div></div><div><h3>Conclusions</h3><div>We developed the first interpretable machine learning model that accurately predicts CRC survival in a nationwide Korean cohort. Age-specific risk factors identified by SHAP not only support personalized care but also advance the application of precision oncology in Asian settings.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"205 ","pages":"Article 106125"},"PeriodicalIF":4.1000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainable AI for colorectal cancer mortality and risk factor prediction in Korea: A nationwide cancer cohort study\",\"authors\":\"Sang Won Park ,&nbsp;Na Young Yeo ,&nbsp;Tae-Hoon Kim ,&nbsp;Myoung Nam Lim ,&nbsp;Inhyeok Yim ,&nbsp;Oh Beom Kwon ,&nbsp;Seung-Joo Nam ,&nbsp;Hui-Young Lee ,&nbsp;Woo Jin Kim\",\"doi\":\"10.1016/j.ijmedinf.2025.106125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Colorectal cancer (CRC) prognosis varies significantly, yet conventional statistical models struggle to capture the complex, non-linear interactions among clinical variables. Furthermore, most predictive models are based on Western populations, limiting their applicability to Korean patients. This study aimed to develop an explainable AI (XAI) model for CRC mortality prediction using a nationwide Korean cohort to provide clinically actionable insights.</div></div><div><h3>Methods</h3><div>We conducted a retrospective cohort study using the Korean Cancer Public Library Database. A total of 9,069 patients with CRC were analyzed for all-cause mortality (1,878 deaths) and 8,589 patients for CRC-specific mortality (1,398 deaths). Four ML algorithms—support vector machine, random forest, XGBoost, and LightGBM—were constructed. We employed explainable AI techniques, including SHapley Additive exPlanations (SHAP), to quantify the contribution of each predictor and ensure model interpretability.</div></div><div><h3>Results</h3><div>All models showed good discrimination (AUC: 0.82–0.94). LightGBM was presented as the best-optimized model with an AUC of 0.824 [95% CI 0.80–0.85] in all-cause mortality. For CRC-specific mortality, LGB again yielded the AUC of 0.867 [95% CI 0.84–0.89]. SHAP revealed tumor stage and carcinoembryonic antigen as top mortality predictors across ages. Metabolic markers (e.g., hypertension, cholesterol) and liver enzymes were more predictive in younger patients.</div></div><div><h3>Conclusions</h3><div>We developed the first interpretable machine learning model that accurately predicts CRC survival in a nationwide Korean cohort. Age-specific risk factors identified by SHAP not only support personalized care but also advance the application of precision oncology in Asian settings.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"205 \",\"pages\":\"Article 106125\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625003429\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625003429","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:结直肠癌(CRC)的预后差异很大,但传统的统计模型难以捕捉临床变量之间复杂的非线性相互作用。此外,大多数预测模型都是基于西方人群,限制了它们对韩国患者的适用性。本研究旨在开发一个可解释的AI (XAI)模型,用于预测韩国全国范围内的CRC死亡率,以提供临床可操作的见解。方法:我们使用韩国癌症公共图书馆数据库进行回顾性队列研究。共分析了9069例结直肠癌患者的全因死亡率(1878例死亡)和8589例结直肠癌特异性死亡率(1398例死亡)。构建了支持向量机、随机森林、XGBoost和lightgbm四种机器学习算法。我们采用可解释的人工智能技术,包括SHapley加性解释(SHAP),来量化每个预测因子的贡献,并确保模型的可解释性。结果:所有模型均具有良好的判别性(AUC: 0.82 ~ 0.94)。LightGBM被认为是最佳优化模型,全因死亡率的AUC为0.824 [95% CI 0.80-0.85]。对于crc特异性死亡率,LGB的AUC为0.867 [95% CI 0.84-0.89]。SHAP显示肿瘤分期和癌胚抗原是各年龄段死亡率的主要预测因子。代谢标志物(如高血压、胆固醇)和肝酶在年轻患者中更具预测性。结论:我们开发了第一个可解释的机器学习模型,可以准确预测韩国全国队列的CRC生存。由SHAP确定的年龄特异性危险因素不仅支持个性化护理,而且促进了精准肿瘤学在亚洲地区的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Explainable AI for colorectal cancer mortality and risk factor prediction in Korea: A nationwide cancer cohort study

Background

Colorectal cancer (CRC) prognosis varies significantly, yet conventional statistical models struggle to capture the complex, non-linear interactions among clinical variables. Furthermore, most predictive models are based on Western populations, limiting their applicability to Korean patients. This study aimed to develop an explainable AI (XAI) model for CRC mortality prediction using a nationwide Korean cohort to provide clinically actionable insights.

Methods

We conducted a retrospective cohort study using the Korean Cancer Public Library Database. A total of 9,069 patients with CRC were analyzed for all-cause mortality (1,878 deaths) and 8,589 patients for CRC-specific mortality (1,398 deaths). Four ML algorithms—support vector machine, random forest, XGBoost, and LightGBM—were constructed. We employed explainable AI techniques, including SHapley Additive exPlanations (SHAP), to quantify the contribution of each predictor and ensure model interpretability.

Results

All models showed good discrimination (AUC: 0.82–0.94). LightGBM was presented as the best-optimized model with an AUC of 0.824 [95% CI 0.80–0.85] in all-cause mortality. For CRC-specific mortality, LGB again yielded the AUC of 0.867 [95% CI 0.84–0.89]. SHAP revealed tumor stage and carcinoembryonic antigen as top mortality predictors across ages. Metabolic markers (e.g., hypertension, cholesterol) and liver enzymes were more predictive in younger patients.

Conclusions

We developed the first interpretable machine learning model that accurately predicts CRC survival in a nationwide Korean cohort. Age-specific risk factors identified by SHAP not only support personalized care but also advance the application of precision oncology in Asian settings.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Medical Informatics
International Journal of Medical Informatics 医学-计算机:信息系统
CiteScore
8.90
自引率
4.10%
发文量
217
审稿时长
42 days
期刊介绍: International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings. The scope of journal covers: Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.; Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc. Educational computer based programs pertaining to medical informatics or medicine in general; Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信