使用基线临床和病理特征预测乳腺癌新辅助化疗反应的可解释机器学习

IF 3.1 2区医学 Q2 ONCOLOGY

Cancer Medicine Pub Date : 2025-09-08 DOI:10.1002/cam4.71221

Shan Fang, Jun Zhang, Chengyan Han, Mingxiang Kong, Haibo Zhang, Miaochun Zhong, Wuzhen Chen, Hongjun Yuan, Wenjie Xia, Wei Zhang

{"title":"使用基线临床和病理特征预测乳腺癌新辅助化疗反应的可解释机器学习","authors":"Shan Fang, Jun Zhang, Chengyan Han, Mingxiang Kong, Haibo Zhang, Miaochun Zhong, Wuzhen Chen, Hongjun Yuan, Wenjie Xia, Wei Zhang","doi":"10.1002/cam4.71221","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 303 <span>bc</span> patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.</p>\n </section>\n </div>","PeriodicalId":139,"journal":{"name":"Cancer Medicine","volume":"14 17","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cam4.71221","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics\",\"authors\":\"Shan Fang, Jun Zhang, Chengyan Han, Mingxiang Kong, Haibo Zhang, Miaochun Zhong, Wuzhen Chen, Hongjun Yuan, Wenjie Xia, Wei Zhang\",\"doi\":\"10.1002/cam4.71221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>A total of 303 <span>bc</span> patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.</p>\\n </section>\\n </div>\",\"PeriodicalId\":139,\"journal\":{\"name\":\"Cancer Medicine\",\"volume\":\"14 17\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cam4.71221\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cam4.71221\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cam4.71221","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

新辅助化疗（NAC）的病理反应已成为乳腺癌（BC）患者预后的重要指标。新生成的模型依赖于相当基本的影像学和病理学特征，并没有充分阐明纳入数据的重要性。本研究的目的是建立并验证一种机器学习模型，利用基线临床和病理特征预测BC患者对NAC的病理完全缓解。方法收集2014年1月至2023年8月在浙江省人民医院接受NAC治疗的BC住院患者的数据。数据集被随机分割，70%分配给模型训练，30%分配给验证。采用LASSO回归选择预测特征。开发了xgboost、LightGBM、CatBoost、逻辑回归、随机森林（RF）和支持向量机（SVM）六个ML模型，并使用曲线下面积（AUC）和准确性、精密度、召回率、F1分数和Brier分数来评估性能。采用决策曲线分析（DCA）评估临床获益，并采用SHapley加性解释（SHAP）解释最佳ML模型的特征。结果共纳入NAC治疗的bc患者303例，pCR率为29.37%（89/303）。选取年龄、绝经状态、PR、HER2状态、Ki-67表达、间质肿瘤浸润淋巴细胞（sTILs）等12个特征进行模型构建。在6个模型中，CatBoost模型的预测性能最好，经过贝叶斯超参数调优后的AUC为0.853。SHAP分析将sTILs列为最关键的预测特征。在五重交叉验证中，包含stil的CatBoost模型的平均AUC为0.83。结论基于ml的pCR预测模型能够更准确地预测BC患者的基线pCR，有助于优化治疗策略。此外，可解释的SHAP框架提高了模型的透明度，促进了临床信任和医生之间的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics

查看原文本刊更多论文

Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics

Background

The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.

Methods

Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.

Results

A total of 303 bc patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.

Conclusions

The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cancer Medicine ONCOLOGY-

CiteScore

5.50

自引率

2.50%

发文量

907

审稿时长

19 weeks

期刊介绍： Cancer Medicine is a peer-reviewed, open access, interdisciplinary journal providing rapid publication of research from global biomedical researchers across the cancer sciences. The journal will consider submissions from all oncologic specialties, including, but not limited to, the following areas: Clinical Cancer Research Translational research ∙ clinical trials ∙ chemotherapy ∙ radiation therapy ∙ surgical therapy ∙ clinical observations ∙ clinical guidelines ∙ genetic consultation ∙ ethical considerations Cancer Biology: Molecular biology ∙ cellular biology ∙ molecular genetics ∙ genomics ∙ immunology ∙ epigenetics ∙ metabolic studies ∙ proteomics ∙ cytopathology ∙ carcinogenesis ∙ drug discovery and delivery. Cancer Prevention: Behavioral science ∙ psychosocial studies ∙ screening ∙ nutrition ∙ epidemiology and prevention ∙ community outreach. Bioinformatics: Gene expressions profiles ∙ gene regulation networks ∙ genome bioinformatics ∙ pathwayanalysis ∙ prognostic biomarkers. Cancer Medicine publishes original research articles, systematic reviews, meta-analyses, and research methods papers, along with invited editorials and commentaries. Original research papers must report well-conducted research with conclusions supported by the data presented in the paper.