{"title":"使用基线临床和病理特征预测乳腺癌新辅助化疗反应的可解释机器学习","authors":"Shan Fang, Jun Zhang, Chengyan Han, Mingxiang Kong, Haibo Zhang, Miaochun Zhong, Wuzhen Chen, Hongjun Yuan, Wenjie Xia, Wei Zhang","doi":"10.1002/cam4.71221","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 303 <span>bc</span> patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.</p>\n </section>\n </div>","PeriodicalId":139,"journal":{"name":"Cancer Medicine","volume":"14 17","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cam4.71221","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics\",\"authors\":\"Shan Fang, Jun Zhang, Chengyan Han, Mingxiang Kong, Haibo Zhang, Miaochun Zhong, Wuzhen Chen, Hongjun Yuan, Wenjie Xia, Wei Zhang\",\"doi\":\"10.1002/cam4.71221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>A total of 303 <span>bc</span> patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.</p>\\n </section>\\n </div>\",\"PeriodicalId\":139,\"journal\":{\"name\":\"Cancer Medicine\",\"volume\":\"14 17\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cam4.71221\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cam4.71221\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cam4.71221","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics
Background
The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.
Methods
Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.
Results
A total of 303 bc patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.
Conclusions
The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.
期刊介绍:
Cancer Medicine is a peer-reviewed, open access, interdisciplinary journal providing rapid publication of research from global biomedical researchers across the cancer sciences. The journal will consider submissions from all oncologic specialties, including, but not limited to, the following areas:
Clinical Cancer Research
Translational research ∙ clinical trials ∙ chemotherapy ∙ radiation therapy ∙ surgical therapy ∙ clinical observations ∙ clinical guidelines ∙ genetic consultation ∙ ethical considerations
Cancer Biology:
Molecular biology ∙ cellular biology ∙ molecular genetics ∙ genomics ∙ immunology ∙ epigenetics ∙ metabolic studies ∙ proteomics ∙ cytopathology ∙ carcinogenesis ∙ drug discovery and delivery.
Cancer Prevention:
Behavioral science ∙ psychosocial studies ∙ screening ∙ nutrition ∙ epidemiology and prevention ∙ community outreach.
Bioinformatics:
Gene expressions profiles ∙ gene regulation networks ∙ genome bioinformatics ∙ pathwayanalysis ∙ prognostic biomarkers.
Cancer Medicine publishes original research articles, systematic reviews, meta-analyses, and research methods papers, along with invited editorials and commentaries. Original research papers must report well-conducted research with conclusions supported by the data presented in the paper.