Zi-Ran Zhang, Chao-Xian Wang, Huan Wang, Si-Li Jin
{"title":"基于机器学习的新辅助化疗后无病理性完全缓解乳腺癌患者无病生存期预测:一项回顾性多中心队列研究","authors":"Zi-Ran Zhang, Chao-Xian Wang, Huan Wang, Si-Li Jin","doi":"10.62347/MHSV3723","DOIUrl":null,"url":null,"abstract":"<p><p>This study aimed to construct a robust machine learning (ML) model for predicting the disease-free survival (DFS) and risk stratification of breast cancer (BC) patients with non-pathological complete response (non-PCR) after neoadjuvant chemotherapy (NAC). The model will facilitate the initiation of early interventions for high-risk patients. This retrospective multicenter cohort study included BC patients from two hospitals in China who received NAC but did not achieve PCR. Four ML algorithms were utilized to construct models based on patients' clinicopathological data, followed by a performance evaluation of these models. To improve the interpretability of the model, the shapley additive explanation (SHAP) method was employed to analyze the contribution of each feature to the predictive outcomes. A total of 463 non-PCR patients were included in the study. Of these, 385 patients were from Ruijin Hospital, affiliated with Shanghai Jiao Tong University, and were randomly split into a training cohort and an internal validation cohort in a 3:1 ratio for model development and preliminary performance evaluation. In addition, 78 patients enrolled from Jiaxing Women and Children's Hospital were assigned to the external validation cohort to evaluate the model's generalizability. Univariate and multivariate Cox regression analyses demonstrated that age, residual tumor size, Ki67 change, molecular subtype, and axillary lymph node metastasis were independent factors influencing DFS. Among the four ML models, the random survival forest (RSF) model showed the best performance, with a concordance index of 0.820 in the training cohort, 0.642 in the internal validation cohort, and 0.689 in the external validation cohort. Further analysis revealed that the RSF model had excellent discriminative ability with a high area under curve value, while its low Brier score indicated excellent calibration. Decision curve analysis indicated that the RSF model offered a higher clinical net benefit at various time points and effectively stratified risk, successfully identifying high-risk patients. SHAP analysis underscored residual tumor size as the most influential predictive feature. The RSF model can effectively predict DFS and risk of BC patients with non-PCR following NAC, offering a critical reference for developing individualized treatment strategies.</p>","PeriodicalId":7437,"journal":{"name":"American journal of cancer research","volume":"15 6","pages":"2482-2499"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256414/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based prediction of disease-free survival in breast cancer patients with non-pathological complete response after neoadjuvant chemotherapy: a retrospective multicenter cohort study.\",\"authors\":\"Zi-Ran Zhang, Chao-Xian Wang, Huan Wang, Si-Li Jin\",\"doi\":\"10.62347/MHSV3723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study aimed to construct a robust machine learning (ML) model for predicting the disease-free survival (DFS) and risk stratification of breast cancer (BC) patients with non-pathological complete response (non-PCR) after neoadjuvant chemotherapy (NAC). The model will facilitate the initiation of early interventions for high-risk patients. This retrospective multicenter cohort study included BC patients from two hospitals in China who received NAC but did not achieve PCR. Four ML algorithms were utilized to construct models based on patients' clinicopathological data, followed by a performance evaluation of these models. To improve the interpretability of the model, the shapley additive explanation (SHAP) method was employed to analyze the contribution of each feature to the predictive outcomes. A total of 463 non-PCR patients were included in the study. Of these, 385 patients were from Ruijin Hospital, affiliated with Shanghai Jiao Tong University, and were randomly split into a training cohort and an internal validation cohort in a 3:1 ratio for model development and preliminary performance evaluation. In addition, 78 patients enrolled from Jiaxing Women and Children's Hospital were assigned to the external validation cohort to evaluate the model's generalizability. Univariate and multivariate Cox regression analyses demonstrated that age, residual tumor size, Ki67 change, molecular subtype, and axillary lymph node metastasis were independent factors influencing DFS. Among the four ML models, the random survival forest (RSF) model showed the best performance, with a concordance index of 0.820 in the training cohort, 0.642 in the internal validation cohort, and 0.689 in the external validation cohort. Further analysis revealed that the RSF model had excellent discriminative ability with a high area under curve value, while its low Brier score indicated excellent calibration. Decision curve analysis indicated that the RSF model offered a higher clinical net benefit at various time points and effectively stratified risk, successfully identifying high-risk patients. SHAP analysis underscored residual tumor size as the most influential predictive feature. The RSF model can effectively predict DFS and risk of BC patients with non-PCR following NAC, offering a critical reference for developing individualized treatment strategies.</p>\",\"PeriodicalId\":7437,\"journal\":{\"name\":\"American journal of cancer research\",\"volume\":\"15 6\",\"pages\":\"2482-2499\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256414/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of cancer research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.62347/MHSV3723\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.62347/MHSV3723","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Machine learning-based prediction of disease-free survival in breast cancer patients with non-pathological complete response after neoadjuvant chemotherapy: a retrospective multicenter cohort study.
This study aimed to construct a robust machine learning (ML) model for predicting the disease-free survival (DFS) and risk stratification of breast cancer (BC) patients with non-pathological complete response (non-PCR) after neoadjuvant chemotherapy (NAC). The model will facilitate the initiation of early interventions for high-risk patients. This retrospective multicenter cohort study included BC patients from two hospitals in China who received NAC but did not achieve PCR. Four ML algorithms were utilized to construct models based on patients' clinicopathological data, followed by a performance evaluation of these models. To improve the interpretability of the model, the shapley additive explanation (SHAP) method was employed to analyze the contribution of each feature to the predictive outcomes. A total of 463 non-PCR patients were included in the study. Of these, 385 patients were from Ruijin Hospital, affiliated with Shanghai Jiao Tong University, and were randomly split into a training cohort and an internal validation cohort in a 3:1 ratio for model development and preliminary performance evaluation. In addition, 78 patients enrolled from Jiaxing Women and Children's Hospital were assigned to the external validation cohort to evaluate the model's generalizability. Univariate and multivariate Cox regression analyses demonstrated that age, residual tumor size, Ki67 change, molecular subtype, and axillary lymph node metastasis were independent factors influencing DFS. Among the four ML models, the random survival forest (RSF) model showed the best performance, with a concordance index of 0.820 in the training cohort, 0.642 in the internal validation cohort, and 0.689 in the external validation cohort. Further analysis revealed that the RSF model had excellent discriminative ability with a high area under curve value, while its low Brier score indicated excellent calibration. Decision curve analysis indicated that the RSF model offered a higher clinical net benefit at various time points and effectively stratified risk, successfully identifying high-risk patients. SHAP analysis underscored residual tumor size as the most influential predictive feature. The RSF model can effectively predict DFS and risk of BC patients with non-PCR following NAC, offering a critical reference for developing individualized treatment strategies.
期刊介绍:
The American Journal of Cancer Research (AJCR) (ISSN 2156-6976), is an independent open access, online only journal to facilitate rapid dissemination of novel discoveries in basic science and treatment of cancer. It was founded by a group of scientists for cancer research and clinical academic oncologists from around the world, who are devoted to the promotion and advancement of our understanding of the cancer and its treatment. The scope of AJCR is intended to encompass that of multi-disciplinary researchers from any scientific discipline where the primary focus of the research is to increase and integrate knowledge about etiology and molecular mechanisms of carcinogenesis with the ultimate aim of advancing the cure and prevention of this increasingly devastating disease. To achieve these aims AJCR will publish review articles, original articles and new techniques in cancer research and therapy. It will also publish hypothesis, case reports and letter to the editor. Unlike most other open access online journals, AJCR will keep most of the traditional features of paper print that we are all familiar with, such as continuous volume, issue numbers, as well as continuous page numbers to retain our comfortable familiarity towards an academic journal.