Explainable machine learning model for predicting internal mammary node metastasis in breast cancer: Multi-method development and cross-cohort validation

IF 5.7 2区 医学 Q1 OBSTETRICS & GYNECOLOGY
Yirong Xiang , Jian Tie , Siyuan Zhang , Chen Shi , Changkuo Guo , Yushuo Peng , Zhaoqing Fan , Weihu Wang
{"title":"Explainable machine learning model for predicting internal mammary node metastasis in breast cancer: Multi-method development and cross-cohort validation","authors":"Yirong Xiang ,&nbsp;Jian Tie ,&nbsp;Siyuan Zhang ,&nbsp;Chen Shi ,&nbsp;Changkuo Guo ,&nbsp;Yushuo Peng ,&nbsp;Zhaoqing Fan ,&nbsp;Weihu Wang","doi":"10.1016/j.breast.2025.104517","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>This study developed an explainable machine learning model for baseline internal mammary lymph node metastasis (IMNM) in breast cancer patients.</div></div><div><h3>Materials and methods</h3><div>This study included three cohorts: a derivation cohort (n = 1997) from Peking University Cancer Hospital, a temporal testing cohort (n = 633) from the same center, and a SEER cohort (n = 51,420). Multiple machine learning strategies were conducted: Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, backward stepwise regression, and best subset for feature selection, and logistic regression (LR), support vector machines (SVM), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost) for model construction. The best-performing model was validated across internal and temporal testing cohorts. Shapley Additive Explanations (SHAP) analysis was conducted to improve interpretability.</div></div><div><h3>Results</h3><div>Six clinical features (clinical N stage, size, stage, classification, grade and location) were used to construct the final predictive model with SVM. The model achieved robust performance, with AUCs of 0·811 (0·790–0·843), 0.806 (0·760-0·857) and 0·864 (0·830–0·926) in the training, internal testing and temporal testing cohort, respectively. High-risk patients exhibited significantly worse outcomes with DFS (HR 2·776, 95 % CI: 1·897–4·064, p &lt; 0·001) and OS (HR of 1·962, 95 % CI: 1·853–2·077, p &lt; 0·001). An online prediction tool was established that allows users to input key clinical variables and obtain model-predicted probabilities along with SHAP-based explanations.</div></div><div><h3>Conclusion</h3><div>This validated and explainable machine learning model offers a practical tool for early risk stratification, aiding clinicians in appropriate baseline imaging selection and adjuvant treatment planning.</div></div>","PeriodicalId":9093,"journal":{"name":"Breast","volume":"82 ","pages":"Article 104517"},"PeriodicalIF":5.7000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S096097762500534X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

This study developed an explainable machine learning model for baseline internal mammary lymph node metastasis (IMNM) in breast cancer patients.

Materials and methods

This study included three cohorts: a derivation cohort (n = 1997) from Peking University Cancer Hospital, a temporal testing cohort (n = 633) from the same center, and a SEER cohort (n = 51,420). Multiple machine learning strategies were conducted: Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, backward stepwise regression, and best subset for feature selection, and logistic regression (LR), support vector machines (SVM), k-nearest neighbors (KNN), and extreme gradient boosting (XGBoost) for model construction. The best-performing model was validated across internal and temporal testing cohorts. Shapley Additive Explanations (SHAP) analysis was conducted to improve interpretability.

Results

Six clinical features (clinical N stage, size, stage, classification, grade and location) were used to construct the final predictive model with SVM. The model achieved robust performance, with AUCs of 0·811 (0·790–0·843), 0.806 (0·760-0·857) and 0·864 (0·830–0·926) in the training, internal testing and temporal testing cohort, respectively. High-risk patients exhibited significantly worse outcomes with DFS (HR 2·776, 95 % CI: 1·897–4·064, p < 0·001) and OS (HR of 1·962, 95 % CI: 1·853–2·077, p < 0·001). An online prediction tool was established that allows users to input key clinical variables and obtain model-predicted probabilities along with SHAP-based explanations.

Conclusion

This validated and explainable machine learning model offers a practical tool for early risk stratification, aiding clinicians in appropriate baseline imaging selection and adjuvant treatment planning.
预测乳腺癌内乳腺淋巴结转移的可解释机器学习模型:多方法开发和跨队列验证
本研究为乳腺癌患者的基线内乳腺淋巴结转移(IMNM)建立了一个可解释的机器学习模型。材料和方法本研究包括三个队列:来自北京大学肿瘤医院的衍生队列(n = 1997),同一中心的时间检测队列(n = 633)和SEER队列(n = 51,420)。采用了多种机器学习策略:最小绝对收缩和选择算子(LASSO)、Boruta、向后逐步回归和最佳子集用于特征选择,以及逻辑回归(LR)、支持向量机(SVM)、k近邻(KNN)和极端梯度增强(XGBoost)用于模型构建。在内部和时间测试队列中验证了表现最佳的模型。采用Shapley加性解释(SHAP)分析提高可解释性。结果6个临床特征(临床N分期、大小、分期、分类、分级、位置)通过支持向量机构建最终预测模型。该模型在训练队列、内部测试队列和时间测试队列中的auc分别为0.811(0.790 ~ 0.843)、0.806(0.760 ~ 0.857)和0.864(0.830 ~ 0.926),具有较好的鲁棒性。高风险患者的DFS预后明显较差(HR 2.776, 95% CI: 1.897 - 4.064, p <;0.001)和OS (HR为1.962,95% CI: 1.853 ~ 0.077, p <;0·001)。建立了一个在线预测工具,允许用户输入关键的临床变量,并获得模型预测的概率以及基于shap的解释。结论:这个经过验证且可解释的机器学习模型为早期风险分层提供了一个实用的工具,帮助临床医生选择合适的基线成像和辅助治疗计划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Breast
Breast 医学-妇产科学
CiteScore
8.70
自引率
2.60%
发文量
165
审稿时长
59 days
期刊介绍: The Breast is an international, multidisciplinary journal for researchers and clinicians, which focuses on translational and clinical research for the advancement of breast cancer prevention, diagnosis and treatment of all stages.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信