Comparison of machine learning and deep learning models for survival prediction in early-stage hormone receptor-positive/HER2-negative breast cancer receiving neoadjuvant chemotherapy
L. Mastrantoni , G. Garufi , G. Giordano , N. Maliziola , E. Di Monte , G. Arcuri , V. Frescura , A. Rotondi , A. Orlandi , L. Carbognin , A. Palazzo , L. Pontolillo , A. Fabi , S. Pannunzio , I. Paris , F. Marazzi , A. Franco , G. Franceschini , G. Scambia , D. Giannarelli , E. Bria
{"title":"Comparison of machine learning and deep learning models for survival prediction in early-stage hormone receptor-positive/HER2-negative breast cancer receiving neoadjuvant chemotherapy","authors":"L. Mastrantoni , G. Garufi , G. Giordano , N. Maliziola , E. Di Monte , G. Arcuri , V. Frescura , A. Rotondi , A. Orlandi , L. Carbognin , A. Palazzo , L. Pontolillo , A. Fabi , S. Pannunzio , I. Paris , F. Marazzi , A. Franco , G. Franceschini , G. Scambia , D. Giannarelli , E. Bria","doi":"10.1016/j.esmorw.2025.100184","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>We compared machine learning (ML) and deep learning (DL) models to predict disease-free survival (DFS) and overall survival (OS) in patients with hormone receptor (HR)-positive/human epidermal growth factor receptor 2 (HER2)-negative breast cancer (BC) receiving neoadjuvant chemotherapy (NACT), using routine clinicopathological features before and after surgery.</div></div><div><h3>Materials and methods</h3><div>In this retrospective cohort, 572 patients with stage I-III HR-positive/HER2-negative BC treated with anthracycline/taxane-based NACT and surgery were analyzed. Data were split into training (<em>n</em> = 463) and validation (<em>n</em> = 109) sets. Five ML models (random survival forest, extra survival tree, gradient boosting machine, support vector machine, regularized Cox) and four neural networks (DeepSurv, DeepHit, logistic hazard, multi-task logistic regression) were trained via five-fold cross-validation. Performance was assessed on the validation cohort by the C-index and integrated Brier score (iBS).</div></div><div><h3>Results</h3><div>Median age was 49 years and pathological complete response (pCR) rate was 15%. Median DFS was 103 months [95% confidence interval (CI) 84.4 months-not estimable (NE)], and 5-year OS was 78.6% (95% CI 74.8% to 82.5%). DeepSurv yielded the best overall performance, with a C-index of 0.70 (95% CI 0.60-0.78, iBS 0.22) for DFS and 0.68 (95% CI 0.56-0.79, iBS 0.17) for OS. The top ML model achieved C-indices of 0.64 (DFS) and 0.68 (OS). Key predictors were nodal status, estrogen receptor/progesterone receptor expression, tumor size, Ki-67 and pCR.</div></div><div><h3>Conclusions</h3><div>Both ML and DL models predicted survival post-NACT in HR-positive/HER2-negative BC, suggesting that simple models can perform as well as DL architectures in small datasets. The marginally higher discrimination of DL models should be weighed against computational demands and lower interpretability compared with ML methods.</div></div>","PeriodicalId":100491,"journal":{"name":"ESMO Real World Data and Digital Oncology","volume":"10 ","pages":"Article 100184"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Real World Data and Digital Oncology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949820125000736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
We compared machine learning (ML) and deep learning (DL) models to predict disease-free survival (DFS) and overall survival (OS) in patients with hormone receptor (HR)-positive/human epidermal growth factor receptor 2 (HER2)-negative breast cancer (BC) receiving neoadjuvant chemotherapy (NACT), using routine clinicopathological features before and after surgery.
Materials and methods
In this retrospective cohort, 572 patients with stage I-III HR-positive/HER2-negative BC treated with anthracycline/taxane-based NACT and surgery were analyzed. Data were split into training (n = 463) and validation (n = 109) sets. Five ML models (random survival forest, extra survival tree, gradient boosting machine, support vector machine, regularized Cox) and four neural networks (DeepSurv, DeepHit, logistic hazard, multi-task logistic regression) were trained via five-fold cross-validation. Performance was assessed on the validation cohort by the C-index and integrated Brier score (iBS).
Results
Median age was 49 years and pathological complete response (pCR) rate was 15%. Median DFS was 103 months [95% confidence interval (CI) 84.4 months-not estimable (NE)], and 5-year OS was 78.6% (95% CI 74.8% to 82.5%). DeepSurv yielded the best overall performance, with a C-index of 0.70 (95% CI 0.60-0.78, iBS 0.22) for DFS and 0.68 (95% CI 0.56-0.79, iBS 0.17) for OS. The top ML model achieved C-indices of 0.64 (DFS) and 0.68 (OS). Key predictors were nodal status, estrogen receptor/progesterone receptor expression, tumor size, Ki-67 and pCR.
Conclusions
Both ML and DL models predicted survival post-NACT in HR-positive/HER2-negative BC, suggesting that simple models can perform as well as DL architectures in small datasets. The marginally higher discrimination of DL models should be weighed against computational demands and lower interpretability compared with ML methods.