Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning.

IF 2.5 4区 医学 Q2 GASTROENTEROLOGY & HEPATOLOGY
Xiao-Lin Ji, Shuo Xu, Xiao-Yu Li, Jin-Huan Xu, Rong-Shuang Han, Ying-Jie Guo, Li-Ping Duan, Zi-Bin Tian
{"title":"Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning.","authors":"Xiao-Lin Ji, Shuo Xu, Xiao-Yu Li, Jin-Huan Xu, Rong-Shuang Han, Ying-Jie Guo, Li-Ping Duan, Zi-Bin Tian","doi":"10.4251/wjgo.v16.i12.4597","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates. With machine learning (ML) algorithms, patient, tumor, and treatment features can be used to develop and validate models for predicting survival. In addition, important variables can be screened and different applications can be provided that could serve as vital references when making clinical decisions and potentially improving patient outcomes in clinical settings.</p><p><strong>Aim: </strong>To construct prognostic prediction models and screen important variables for patients with stage I to III CRC.</p><p><strong>Methods: </strong>More than 1000 postoperative CRC patients were grouped according to survival time (with cutoff values of 3 years and 5 years) and assigned to training and testing cohorts (7:3). For each 3-category survival time, predictions were made by 4 ML algorithms (all-variable and important variable-only datasets), each of which was validated <i>via</i> 5-fold cross-validation and bootstrap validation. Important variables were screened with multivariable regression methods. Model performance was evaluated and compared before and after variable screening with the area under the curve (AUC). SHapley Additive exPlanations (SHAP) further demonstrated the impact of important variables on model decision-making. Nomograms were constructed for practical model application.</p><p><strong>Results: </strong>Our ML models performed well; the model performance before and after important parameter identification was consistent, and variable screening was effective. The highest pre- and postscreening model AUCs 95% confidence intervals in the testing set were 0.87 (0.81-0.92) and 0.89 (0.84-0.93) for overall survival, 0.75 (0.69-0.82) and 0.73 (0.64-0.81) for disease-free survival, 0.95 (0.88-1.00) and 0.88 (0.75-0.97) for recurrence-free survival, and 0.76 (0.47-0.95) and 0.80 (0.53-0.94) for distant metastasis-free survival. Repeated cross-validation and bootstrap validation were performed in both the training and testing datasets. The SHAP values of the important variables were consistent with the clinicopathological characteristics of patients with tumors. The nomograms were created.</p><p><strong>Conclusion: </strong>We constructed a comprehensive, high-accuracy, important variable-based ML architecture for predicting the 3-category survival times. This architecture could serve as a vital reference for managing CRC patients.</p>","PeriodicalId":23762,"journal":{"name":"World Journal of Gastrointestinal Oncology","volume":"16 12","pages":"4597-4613"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11577370/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Gastrointestinal Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4251/wjgo.v16.i12.4597","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates. With machine learning (ML) algorithms, patient, tumor, and treatment features can be used to develop and validate models for predicting survival. In addition, important variables can be screened and different applications can be provided that could serve as vital references when making clinical decisions and potentially improving patient outcomes in clinical settings.

Aim: To construct prognostic prediction models and screen important variables for patients with stage I to III CRC.

Methods: More than 1000 postoperative CRC patients were grouped according to survival time (with cutoff values of 3 years and 5 years) and assigned to training and testing cohorts (7:3). For each 3-category survival time, predictions were made by 4 ML algorithms (all-variable and important variable-only datasets), each of which was validated via 5-fold cross-validation and bootstrap validation. Important variables were screened with multivariable regression methods. Model performance was evaluated and compared before and after variable screening with the area under the curve (AUC). SHapley Additive exPlanations (SHAP) further demonstrated the impact of important variables on model decision-making. Nomograms were constructed for practical model application.

Results: Our ML models performed well; the model performance before and after important parameter identification was consistent, and variable screening was effective. The highest pre- and postscreening model AUCs 95% confidence intervals in the testing set were 0.87 (0.81-0.92) and 0.89 (0.84-0.93) for overall survival, 0.75 (0.69-0.82) and 0.73 (0.64-0.81) for disease-free survival, 0.95 (0.88-1.00) and 0.88 (0.75-0.97) for recurrence-free survival, and 0.76 (0.47-0.95) and 0.80 (0.53-0.94) for distant metastasis-free survival. Repeated cross-validation and bootstrap validation were performed in both the training and testing datasets. The SHAP values of the important variables were consistent with the clinicopathological characteristics of patients with tumors. The nomograms were created.

Conclusion: We constructed a comprehensive, high-accuracy, important variable-based ML architecture for predicting the 3-category survival times. This architecture could serve as a vital reference for managing CRC patients.

基于机器学习的 I 至 III 期结直肠癌术后患者预后预测模型。
背景:结直肠癌(CRC)具有高异质性、侵袭性、高发病率和死亡率的特点。通过机器学习(ML)算法,患者、肿瘤和治疗特征可用于开发和验证预测生存的模型。此外,可以筛选重要的变量,并提供不同的应用程序,这些应用程序可以作为临床决策的重要参考,并可能在临床环境中改善患者的预后。目的:建立1 ~ 3期结直肠癌患者预后预测模型,筛选重要变量。方法:1000余例结直肠癌术后患者按生存时间(截断值为3年和5年)分组,按7:3的比例分为训练组和检测组。对于每个3类生存时间,通过4个ML算法(全变量和仅重要变量数据集)进行预测,每个算法都通过5倍交叉验证和bootstrap验证进行验证。采用多变量回归方法筛选重要变量。用曲线下面积(AUC)评价和比较变量筛选前后模型的性能。SHapley加性解释(SHAP)进一步证明了重要变量对模型决策的影响。为实际模型应用,构造了nomogram。结果:我们的ML模型表现良好;重要参数辨识前后模型性能一致,变量筛选有效。筛查前和筛查后模型的最高aus 95%置信区间为总生存率0.87(0.81-0.92)和0.89(0.84-0.93),无病生存率0.75(0.69-0.82)和0.73(0.64-0.81),无复发生存率0.95(0.88-1.00)和0.88(0.75-0.97),无远处转移生存率0.76(0.47-0.95)和0.80(0.53-0.94)。在训练和测试数据集中进行重复交叉验证和自举验证。重要变量的SHAP值与肿瘤患者的临床病理特征一致。这些图被创造出来了。结论:我们构建了一个全面、高精度、重要的基于变量的机器学习架构,用于预测三类生存时间。该架构可作为管理结直肠癌患者的重要参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
World Journal of Gastrointestinal Oncology
World Journal of Gastrointestinal Oncology Medicine-Gastroenterology
CiteScore
4.20
自引率
3.30%
发文量
1082
期刊介绍: The World Journal of Gastrointestinal Oncology (WJGO) is a leading academic journal devoted to reporting the latest, cutting-edge research progress and findings of basic research and clinical practice in the field of gastrointestinal oncology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信