使用基线临床和病理特征预测乳腺癌新辅助化疗反应的可解释机器学习

IF 3.1 2区 医学 Q2 ONCOLOGY
Cancer Medicine Pub Date : 2025-09-08 DOI:10.1002/cam4.71221
Shan Fang, Jun Zhang, Chengyan Han, Mingxiang Kong, Haibo Zhang, Miaochun Zhong, Wuzhen Chen, Hongjun Yuan, Wenjie Xia, Wei Zhang
{"title":"使用基线临床和病理特征预测乳腺癌新辅助化疗反应的可解释机器学习","authors":"Shan Fang,&nbsp;Jun Zhang,&nbsp;Chengyan Han,&nbsp;Mingxiang Kong,&nbsp;Haibo Zhang,&nbsp;Miaochun Zhong,&nbsp;Wuzhen Chen,&nbsp;Hongjun Yuan,&nbsp;Wenjie Xia,&nbsp;Wei Zhang","doi":"10.1002/cam4.71221","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 303 <span>bc</span> patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.</p>\n </section>\n </div>","PeriodicalId":139,"journal":{"name":"Cancer Medicine","volume":"14 17","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cam4.71221","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics\",\"authors\":\"Shan Fang,&nbsp;Jun Zhang,&nbsp;Chengyan Han,&nbsp;Mingxiang Kong,&nbsp;Haibo Zhang,&nbsp;Miaochun Zhong,&nbsp;Wuzhen Chen,&nbsp;Hongjun Yuan,&nbsp;Wenjie Xia,&nbsp;Wei Zhang\",\"doi\":\"10.1002/cam4.71221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>A total of 303 <span>bc</span> patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.</p>\\n </section>\\n </div>\",\"PeriodicalId\":139,\"journal\":{\"name\":\"Cancer Medicine\",\"volume\":\"14 17\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cam4.71221\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cam4.71221\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cam4.71221","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

新辅助化疗(NAC)的病理反应已成为乳腺癌(BC)患者预后的重要指标。新生成的模型依赖于相当基本的影像学和病理学特征,并没有充分阐明纳入数据的重要性。本研究的目的是建立并验证一种机器学习模型,利用基线临床和病理特征预测BC患者对NAC的病理完全缓解。方法收集2014年1月至2023年8月在浙江省人民医院接受NAC治疗的BC住院患者的数据。数据集被随机分割,70%分配给模型训练,30%分配给验证。采用LASSO回归选择预测特征。开发了xgboost、LightGBM、CatBoost、逻辑回归、随机森林(RF)和支持向量机(SVM)六个ML模型,并使用曲线下面积(AUC)和准确性、精密度、召回率、F1分数和Brier分数来评估性能。采用决策曲线分析(DCA)评估临床获益,并采用SHapley加性解释(SHAP)解释最佳ML模型的特征。结果共纳入NAC治疗的bc患者303例,pCR率为29.37%(89/303)。选取年龄、绝经状态、PR、HER2状态、Ki-67表达、间质肿瘤浸润淋巴细胞(sTILs)等12个特征进行模型构建。在6个模型中,CatBoost模型的预测性能最好,经过贝叶斯超参数调优后的AUC为0.853。SHAP分析将sTILs列为最关键的预测特征。在五重交叉验证中,包含stil的CatBoost模型的平均AUC为0.83。结论基于ml的pCR预测模型能够更准确地预测BC患者的基线pCR,有助于优化治疗策略。此外,可解释的SHAP框架提高了模型的透明度,促进了临床信任和医生之间的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics

Interpretable Machine Learning for Predicting Neoadjuvant Chemotherapy Response in Breast Cancer Using the Baseline Clinical and Pathological Characteristics

Background

The pathological response to neoadjuvant chemotherapy (NAC) has become a vital prognostic indicator for patients with breast cancer (BC). The newly generated models depended on rather basic imaging and pathology characteristics and did not sufficiently elucidate the importance of the incorporated data. The purpose of this study is to establish and authenticate a machine learning model for predicting the pathological complete response to NAC using baseline clinical and pathological features in BC patients.

Methods

Data were collected from hospitalized BC patients treated with NAC at Zhejiang Provincial People's Hospital between January 2014 and August 2023. The dataset was randomly split, with 70% allocated for model training and 30% for validation. LASSO regression was used to select predictive features. Six ML models—XGBoost, LightGBM, CatBoost, logistic regression, random forest (RF), and support vector machine (SVM)—were developed, with performance assessed using the area under the curve (AUC) and accuracy, precision, recall, F1 score, and Brier score. Clinical benefits were evaluated using decision curve analysis (DCA), and SHapley Additive exPlanation (SHAP) was applied to interpret the features of the optimal ML model.

Results

A total of 303 bc patients treated with NAC were included, with a pCR rate of 29.37% (89/303). Twelve features, such as age, menopausal status, PR, HER2 status, Ki-67 expression, stromal tumor-infiltrating lymphocytes (sTILs) et al., were selected for model construction. Among the six models, the CatBoost model demonstrated the best predictive performance, achieving an AUC of 0.853 after Bayesian hyperparameter tuning. SHAP analysis ranked sTILs as the most critical predictive feature. In fivefold cross-validation, the CatBoost model incorporating sTILs achieved an average AUC of 0.83.

Conclusions

The ML-based pCR prediction model enables more accurate pCR prediction for BC patients at baseline, aiding in optimizing treatment strategies. Additionally, the interpretable SHAP framework enhances model transparency, fostering clinical trust, and understanding among doctors.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cancer Medicine
Cancer Medicine ONCOLOGY-
CiteScore
5.50
自引率
2.50%
发文量
907
审稿时长
19 weeks
期刊介绍: Cancer Medicine is a peer-reviewed, open access, interdisciplinary journal providing rapid publication of research from global biomedical researchers across the cancer sciences. The journal will consider submissions from all oncologic specialties, including, but not limited to, the following areas: Clinical Cancer Research Translational research ∙ clinical trials ∙ chemotherapy ∙ radiation therapy ∙ surgical therapy ∙ clinical observations ∙ clinical guidelines ∙ genetic consultation ∙ ethical considerations Cancer Biology: Molecular biology ∙ cellular biology ∙ molecular genetics ∙ genomics ∙ immunology ∙ epigenetics ∙ metabolic studies ∙ proteomics ∙ cytopathology ∙ carcinogenesis ∙ drug discovery and delivery. Cancer Prevention: Behavioral science ∙ psychosocial studies ∙ screening ∙ nutrition ∙ epidemiology and prevention ∙ community outreach. Bioinformatics: Gene expressions profiles ∙ gene regulation networks ∙ genome bioinformatics ∙ pathwayanalysis ∙ prognostic biomarkers. Cancer Medicine publishes original research articles, systematic reviews, meta-analyses, and research methods papers, along with invited editorials and commentaries. Original research papers must report well-conducted research with conclusions supported by the data presented in the paper.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信