Integrating deep learning features from mammography with SHAP values for a machine learning model predicting over 5-year recurrence of breast ductal carcinoma In Situ post-lumpectomy.
Yupeng Sha, Quan Yuan, Yi Du, Shuqi Yang, Ming Niu, Xiaoshuan Liang, Shanshan Sun, Tong Li, Shu Gong, Jiguang Han
{"title":"Integrating deep learning features from mammography with SHAP values for a machine learning model predicting over 5-year recurrence of breast ductal carcinoma <i>In Situ</i> post-lumpectomy.","authors":"Yupeng Sha, Quan Yuan, Yi Du, Shuqi Yang, Ming Niu, Xiaoshuan Liang, Shanshan Sun, Tong Li, Shu Gong, Jiguang Han","doi":"10.3389/fimmu.2025.1681072","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In women with ductal carcinoma <i>in situ</i> (DCIS) undergoing breast-conserving surgery, still part will progress to invasive breast cancer (IBC) in the future. Mammograms offer rich tumor data for patient stratification, but current prediction methods focus on clinicopathological factors, overlooking imaging insights.</p><p><strong>Methods: </strong>We retrospectively analyzed 140 DCIS patients from Harbin Medical University Cancer Hospital (2011-2020, followed up to 2025). Preoperative digital mammograms and clinicopathological data were collected, with mammographic features extracted using pyradiomics and supervised by a senior radiologist. Feature selection employed 10-fold cross-validated LASSO regression. The dataset was split into training (n=100) and validation (n=40) sets (10:4 ratio). Sixteen machine learning algorithms combining mammographic deep learning features and clinicopathological variables were developed and compared for predicting DCIS recurrence. Model performance was assessed using ROC, sensitivity, specificity, PPV, NPV, and SHAP values for interpretation.</p><p><strong>Results: </strong>The Gradient Boosting Machine (GBM) algorithm had the best predictive performance, with an AUC of 0.918 (95% CI 0.873-0.963) in the test set. SHAP values indicated that the mammographic signature (MS) was the most significant predictor, followed by Ki-67 index and histological grade. Patients not receiving radiotherapy had higher recurrence rates than those who did. Decision curve analysis validated the model's clinical utility across various risk thresholds.</p><p><strong>Conclusion: </strong>Our study developed an interpretable GBM model incorporating mammographic and clinical data to predict DCIS recurrence (AUC = 0.918). Key predictors were mammographic signature, Ki-67, and tumor grade, offering clinicians a practical tool for personalized postoperative management.</p>","PeriodicalId":12622,"journal":{"name":"Frontiers in Immunology","volume":"16 ","pages":"1681072"},"PeriodicalIF":5.9000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12477132/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Immunology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fimmu.2025.1681072","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: In women with ductal carcinoma in situ (DCIS) undergoing breast-conserving surgery, still part will progress to invasive breast cancer (IBC) in the future. Mammograms offer rich tumor data for patient stratification, but current prediction methods focus on clinicopathological factors, overlooking imaging insights.
Methods: We retrospectively analyzed 140 DCIS patients from Harbin Medical University Cancer Hospital (2011-2020, followed up to 2025). Preoperative digital mammograms and clinicopathological data were collected, with mammographic features extracted using pyradiomics and supervised by a senior radiologist. Feature selection employed 10-fold cross-validated LASSO regression. The dataset was split into training (n=100) and validation (n=40) sets (10:4 ratio). Sixteen machine learning algorithms combining mammographic deep learning features and clinicopathological variables were developed and compared for predicting DCIS recurrence. Model performance was assessed using ROC, sensitivity, specificity, PPV, NPV, and SHAP values for interpretation.
Results: The Gradient Boosting Machine (GBM) algorithm had the best predictive performance, with an AUC of 0.918 (95% CI 0.873-0.963) in the test set. SHAP values indicated that the mammographic signature (MS) was the most significant predictor, followed by Ki-67 index and histological grade. Patients not receiving radiotherapy had higher recurrence rates than those who did. Decision curve analysis validated the model's clinical utility across various risk thresholds.
Conclusion: Our study developed an interpretable GBM model incorporating mammographic and clinical data to predict DCIS recurrence (AUC = 0.918). Key predictors were mammographic signature, Ki-67, and tumor grade, offering clinicians a practical tool for personalized postoperative management.
期刊介绍:
Frontiers in Immunology is a leading journal in its field, publishing rigorously peer-reviewed research across basic, translational and clinical immunology. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians and the public worldwide.
Frontiers in Immunology is the official Journal of the International Union of Immunological Societies (IUIS). Encompassing the entire field of Immunology, this journal welcomes papers that investigate basic mechanisms of immune system development and function, with a particular emphasis given to the description of the clinical and immunological phenotype of human immune disorders, and on the definition of their molecular basis.