Explainable Machine Learning Models for Colorectal Cancer Prediction Using Clinical Laboratory Data.

IF 2.5 4区 医学 Q3 ONCOLOGY
Cancer Control Pub Date : 2025-01-01 Epub Date: 2025-05-07 DOI:10.1177/10732748251336417
Rui Li, Xiaoyan Hao, Yanjun Diao, Liu Yang, Jiayun Liu
{"title":"Explainable Machine Learning Models for Colorectal Cancer Prediction Using Clinical Laboratory Data.","authors":"Rui Li, Xiaoyan Hao, Yanjun Diao, Liu Yang, Jiayun Liu","doi":"10.1177/10732748251336417","DOIUrl":null,"url":null,"abstract":"<p><p>IntroductionEarly diagnosis of colorectal cancer (CRC) poses a significant clinical challenge. This study aims to develop machine learning (ML) models for CRC risk prediction using clinical laboratory data.MethodsThis retrospective, single-center study analyzed laboratory examination data from healthy controls (HC), polyp patients (Polyp), and CRC patients between 2013 and 2023. Five ML algorithms, including adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), decision tree (DT), logistic regression (LR), and random forest (RF), were employed to classify subjects into HC vs Polyp vs CRC, HC vs CRC, and Polyp vs CRC, respectively.ResultsThis study included 31 539 subjects: 11 793 HCs, 10 125 polyp patients, and 9621 CRC patients. The XGBoost model achieved the highest AUCs of 0.966 for differentiating HC from CRC and 0.881 for Polyp from CRC, outperforming carcino-embryonic antigen (CEA) and fecal occult blood testing (FOBT) tests. This model could also identify CEA-negative or FOBT-negative CRC patients. Incorporating stool miR-92a detection into the model further improved diagnostic performance. Shapley additive explanations (SHAP) plots indicated that FOBT, CEA, lymphocyte percentage (LYMPH%), and hematocrit (HCT) were the most significant features contributing to CRC diagnosis. Additionally, a computational tool for predicting CRC risk based on the optimal model was developed, designed for researchers with programming experience.ConclusionFive ML models for CRC diagnosis, based on ten routine laboratory test items, were developed, achieving higher diagnostic accuracies than traditional CRC biomarkers. The diagnostic capabilities of these ML models can be further enhanced by including stool miR-92a levels.</p>","PeriodicalId":49093,"journal":{"name":"Cancer Control","volume":"32 ","pages":"10732748251336417"},"PeriodicalIF":2.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12062600/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Control","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/10732748251336417","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/7 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

IntroductionEarly diagnosis of colorectal cancer (CRC) poses a significant clinical challenge. This study aims to develop machine learning (ML) models for CRC risk prediction using clinical laboratory data.MethodsThis retrospective, single-center study analyzed laboratory examination data from healthy controls (HC), polyp patients (Polyp), and CRC patients between 2013 and 2023. Five ML algorithms, including adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), decision tree (DT), logistic regression (LR), and random forest (RF), were employed to classify subjects into HC vs Polyp vs CRC, HC vs CRC, and Polyp vs CRC, respectively.ResultsThis study included 31 539 subjects: 11 793 HCs, 10 125 polyp patients, and 9621 CRC patients. The XGBoost model achieved the highest AUCs of 0.966 for differentiating HC from CRC and 0.881 for Polyp from CRC, outperforming carcino-embryonic antigen (CEA) and fecal occult blood testing (FOBT) tests. This model could also identify CEA-negative or FOBT-negative CRC patients. Incorporating stool miR-92a detection into the model further improved diagnostic performance. Shapley additive explanations (SHAP) plots indicated that FOBT, CEA, lymphocyte percentage (LYMPH%), and hematocrit (HCT) were the most significant features contributing to CRC diagnosis. Additionally, a computational tool for predicting CRC risk based on the optimal model was developed, designed for researchers with programming experience.ConclusionFive ML models for CRC diagnosis, based on ten routine laboratory test items, were developed, achieving higher diagnostic accuracies than traditional CRC biomarkers. The diagnostic capabilities of these ML models can be further enhanced by including stool miR-92a levels.

使用临床实验室数据预测结直肠癌的可解释机器学习模型。
结直肠癌(CRC)的早期诊断是一个重大的临床挑战。本研究旨在利用临床实验室数据开发用于CRC风险预测的机器学习(ML)模型。方法本研究为回顾性、单中心研究,分析2013 - 2023年健康对照(HC)、息肉患者(polyp)和结直肠癌患者的实验室检查数据。采用自适应增强(AdaBoost)、极端梯度增强(XGBoost)、决策树(DT)、逻辑回归(LR)和随机森林(RF)等5种ML算法,分别将受试者分为HC vs Polyp vs CRC、HC vs CRC和Polyp vs CRC。结果共纳入31 539例患者,其中肝癌患者11 793例,息肉患者10 125例,结直肠癌患者9621例。XGBoost模型鉴别HC和CRC的auc最高,分别为0.966和0.881,优于癌胚抗原(CEA)和粪便潜血试验(FOBT)。该模型也可以识别cea阴性或fobt阴性的CRC患者。将粪便miR-92a检测纳入模型进一步提高了诊断性能。Shapley加性解释(SHAP)图显示,FOBT、CEA、淋巴细胞百分比(LYMPH%)和红细胞压积(HCT)是诊断结直肠癌的最重要特征。此外,开发了基于最优模型的CRC风险预测计算工具,专为具有编程经验的研究人员设计。结论基于10项常规实验室检测项目,建立了5种用于结直肠癌诊断的ML模型,其诊断准确率高于传统的结直肠癌生物标志物。这些ML模型的诊断能力可以通过纳入粪便miR-92a水平进一步增强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cancer Control
Cancer Control ONCOLOGY-
CiteScore
3.80
自引率
0.00%
发文量
148
审稿时长
>12 weeks
期刊介绍: Cancer Control is a JCR-ranked, peer-reviewed open access journal whose mission is to advance the prevention, detection, diagnosis, treatment, and palliative care of cancer by enabling researchers, doctors, policymakers, and other healthcare professionals to freely share research along the cancer control continuum. Our vision is a world where gold-standard cancer care is the norm, not the exception.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信