OncoE25: an AI model for predicting postoperative prognosis in early-onset stage I-III colon and rectal cancer-a population-based study using SEER with dual-center cohort validation.
{"title":"OncoE25: an AI model for predicting postoperative prognosis in early-onset stage I-III colon and rectal cancer-a population-based study using SEER with dual-center cohort validation.","authors":"Luyun Yuan, Liyu Wang, Jiamin Gao, Xin Chen, Haoyue Wang, Wei Shan Tan, Kexiang Sun, Yabin Gong, Wanli Deng","doi":"10.1186/s12967-025-06663-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although CRC incidence is declining overall, early-onset colorectal cancers are increasing. No prognostic models currently exist for predicting postoperative survival in Stage I-III early-onset colon or rectal cancer. Such tools are urgently needed to enable individualized risk assessment.</p><p><strong>Methods: </strong>We identified patients with early onset (EO) and late-onset (LO) colon or rectal cancer from the SEER database and randomly split them into training and test cohorts (7:3). External cohorts of early-onset colon and rectal cancer were collected from two Chinese hospitals. After LASSO-Cox feature selection, six models-RSF, LASSO-Cox, S-SVM, XGBSE, GBSA, and DeepSurv-were developed to predict cancer-specific survival (CSS). Performance was assessed using the C-index, Brier score, time-dependent AUC, calibration, and decision curves. SHAP was used for model interpretation. A risk stratification system and an online calculator were constructed based on the best-performing model.</p><p><strong>Results: </strong>A total of 3,997 EO colon cancer, 2,016 EO rectal cancer, 30,621 LO colon cancer, and 8,667 LO rectal cancer patients from SEER, along with 205 EO colon cancer and 153 EO rectal cancer patients from Chinese institutions, were included in the study. Based on comprehensive evaluation across multiple datasets and metrics, the RSF model demonstrated the best and most stable performance, outperforming not only other machine learning models but also the traditional TNM staging system. In EO colon cancer, the RSF model achieved C-indices of 0.738 (test cohort) and 0.829 (external validation), mean AUCs of 0.765 and 0.889, and integrated Brier scores of 0.084 and 0.077, respectively. For EO rectal cancer, C-indices were 0.728 and 0.722, mean AUCs were 0.753 and 0.900, and integrated Brier scores were 0.106 and 0.095, respectively. The calibration and decision curves further confirmed the RSF model's good calibration and clinical net benefit. The RSF model also showed robust performance in LOCRC cohorts. SHAP analysis was used to quantify the marginal contribution of each predictor within each cancer subtype. Based on the RSF model, we developed a CSS-based risk stratification framework and deployed an online prediction tool.</p><p><strong>Conclusions: </strong>In summary, we selected the RSF model for its outstanding predictive performance, naming it OncoE25, to support personalized health management for EO colon and rectal patients.</p>","PeriodicalId":17458,"journal":{"name":"Journal of Translational Medicine","volume":"23 1","pages":"695"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12183820/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12967-025-06663-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Although CRC incidence is declining overall, early-onset colorectal cancers are increasing. No prognostic models currently exist for predicting postoperative survival in Stage I-III early-onset colon or rectal cancer. Such tools are urgently needed to enable individualized risk assessment.
Methods: We identified patients with early onset (EO) and late-onset (LO) colon or rectal cancer from the SEER database and randomly split them into training and test cohorts (7:3). External cohorts of early-onset colon and rectal cancer were collected from two Chinese hospitals. After LASSO-Cox feature selection, six models-RSF, LASSO-Cox, S-SVM, XGBSE, GBSA, and DeepSurv-were developed to predict cancer-specific survival (CSS). Performance was assessed using the C-index, Brier score, time-dependent AUC, calibration, and decision curves. SHAP was used for model interpretation. A risk stratification system and an online calculator were constructed based on the best-performing model.
Results: A total of 3,997 EO colon cancer, 2,016 EO rectal cancer, 30,621 LO colon cancer, and 8,667 LO rectal cancer patients from SEER, along with 205 EO colon cancer and 153 EO rectal cancer patients from Chinese institutions, were included in the study. Based on comprehensive evaluation across multiple datasets and metrics, the RSF model demonstrated the best and most stable performance, outperforming not only other machine learning models but also the traditional TNM staging system. In EO colon cancer, the RSF model achieved C-indices of 0.738 (test cohort) and 0.829 (external validation), mean AUCs of 0.765 and 0.889, and integrated Brier scores of 0.084 and 0.077, respectively. For EO rectal cancer, C-indices were 0.728 and 0.722, mean AUCs were 0.753 and 0.900, and integrated Brier scores were 0.106 and 0.095, respectively. The calibration and decision curves further confirmed the RSF model's good calibration and clinical net benefit. The RSF model also showed robust performance in LOCRC cohorts. SHAP analysis was used to quantify the marginal contribution of each predictor within each cancer subtype. Based on the RSF model, we developed a CSS-based risk stratification framework and deployed an online prediction tool.
Conclusions: In summary, we selected the RSF model for its outstanding predictive performance, naming it OncoE25, to support personalized health management for EO colon and rectal patients.
期刊介绍:
The Journal of Translational Medicine is an open-access journal that publishes articles focusing on information derived from human experimentation to enhance communication between basic and clinical science. It covers all areas of translational medicine.