SEER-based machine learning prediction of bone metastasis in breast cancer: model development and validation.

IF 1.6 3区医学 Q3 SURGERY

Gland surgery Pub Date : 2025-07-31 Epub Date: 2025-07-28 DOI:10.21037/gs-2025-168

Ying Gao, Lei Liu, Shoujun Wang, Weijie Tao, Jinmiao Wang, Ran Duan, Hai Xie, Hideaki Takahashi, Jie Hao, Ming Gao

{"title":"SEER-based machine learning prediction of bone metastasis in breast cancer: model development and validation.","authors":"Ying Gao, Lei Liu, Shoujun Wang, Weijie Tao, Jinmiao Wang, Ran Duan, Hai Xie, Hideaki Takahashi, Jie Hao, Ming Gao","doi":"10.21037/gs-2025-168","DOIUrl":null,"url":null,"abstract":"Background: Breast cancer (BC) is the leading cancer in women. It often metastasizes to bone, worsening the prognosis. Diagnostic methods often fail to predict bone metastasis (BM). This study developed a machine learning (ML) model using the Surveillance, Epidemiology, and End Results (SEER) database for BM prediction, to refine treatments and improve outcomes.Methods: Using SEER data, we studied 24,584 BC patients diagnosed 2010-2015 with radiologically confirmed BM. Tumor size, grade, tumor (T)/node (N) stages, and estrogen receptor (ER)/progesterone receptor (PR)/human epidermal growth factor receptor 2 (HER2) status were assessed. Stratified randomization divided the data into 70% training (n=18,438) and 30% validation (n=6,146). Six ML algorithms were developed, emphasizing random forest (RF). Receiver operating characteristic (ROC) curve analysis [area under the curve (AUC), sensitivity, specificity, negative predictive value (NPV)] assessed performance. The SHapley Additive exPlanations (SHAP) framework identified key BM predictors.Results: Our analysis of 24,584 patients identified 1,298 (5.26%) patients with BM. Logistic regression (LR) provided the highest specificity [0.897, 95% confidence interval (CI): 0.889-0.905], contrasting with gradient boosting machine (GBM)'s maximal sensitivity (0.658, 95% CI: 0.609-0.707). With sensitivity at 0.658, better algorithms or multimodal methods are needed for case identification. The multilayer perceptron neural network (MLPNN) model demonstrated superior performance, with the highest AUC of 0.808 (95% CI: 0.798-0.818), surpassing the LR and adaptive boosting (AdaBoost) models, both with AUCs of 0.803 (95% CI: 0.793-0.813). The RF model was particularly adept at ruling out BM, with an NPV above 97%. The SHAP analysis identified tumor size, grade, T/N stages, ER/PR/HER2 status, and brain/liver/lung metastases as key predictors for risk stratification. Decision curve analysis showed RF's superior utility over the American Joint Committee on Cancer (AJCC) Staging System.Conclusions: Our ML model demonstrates potential for predicting BM in patients with BC. It may serve as a clinical aid to identify at-risk individuals early. However, moderate sensitivity requires refinement for better case detection. This study supports integrating ML into clinical practice, advancing personalized oncology medicine.","PeriodicalId":12760,"journal":{"name":"Gland surgery","volume":"14 7","pages":"1366-1378"},"PeriodicalIF":1.6000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12322768/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gland surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/gs-2025-168","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/28 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Breast cancer (BC) is the leading cancer in women. It often metastasizes to bone, worsening the prognosis. Diagnostic methods often fail to predict bone metastasis (BM). This study developed a machine learning (ML) model using the Surveillance, Epidemiology, and End Results (SEER) database for BM prediction, to refine treatments and improve outcomes.

Methods: Using SEER data, we studied 24,584 BC patients diagnosed 2010-2015 with radiologically confirmed BM. Tumor size, grade, tumor (T)/node (N) stages, and estrogen receptor (ER)/progesterone receptor (PR)/human epidermal growth factor receptor 2 (HER2) status were assessed. Stratified randomization divided the data into 70% training (n=18,438) and 30% validation (n=6,146). Six ML algorithms were developed, emphasizing random forest (RF). Receiver operating characteristic (ROC) curve analysis [area under the curve (AUC), sensitivity, specificity, negative predictive value (NPV)] assessed performance. The SHapley Additive exPlanations (SHAP) framework identified key BM predictors.

Results: Our analysis of 24,584 patients identified 1,298 (5.26%) patients with BM. Logistic regression (LR) provided the highest specificity [0.897, 95% confidence interval (CI): 0.889-0.905], contrasting with gradient boosting machine (GBM)'s maximal sensitivity (0.658, 95% CI: 0.609-0.707). With sensitivity at 0.658, better algorithms or multimodal methods are needed for case identification. The multilayer perceptron neural network (MLPNN) model demonstrated superior performance, with the highest AUC of 0.808 (95% CI: 0.798-0.818), surpassing the LR and adaptive boosting (AdaBoost) models, both with AUCs of 0.803 (95% CI: 0.793-0.813). The RF model was particularly adept at ruling out BM, with an NPV above 97%. The SHAP analysis identified tumor size, grade, T/N stages, ER/PR/HER2 status, and brain/liver/lung metastases as key predictors for risk stratification. Decision curve analysis showed RF's superior utility over the American Joint Committee on Cancer (AJCC) Staging System.

Conclusions: Our ML model demonstrates potential for predicting BM in patients with BC. It may serve as a clinical aid to identify at-risk individuals early. However, moderate sensitivity requires refinement for better case detection. This study supports integrating ML into clinical practice, advancing personalized oncology medicine.

Abstract Image

查看原文本刊更多论文

基于seer的机器学习预测乳腺癌骨转移：模型开发和验证。

背景：乳腺癌（BC）是女性的主要癌症。它经常转移到骨骼，使预后恶化。诊断方法往往不能预测骨转移（BM）。本研究利用监测、流行病学和最终结果（SEER）数据库开发了一种机器学习（ML）模型，用于BM预测，以改进治疗方法并改善结果。方法：使用SEER数据，我们研究了2010-2015年诊断为放射学证实的BM的24,584例BC患者。评估肿瘤大小、分级、肿瘤(T)/淋巴结(N)分期、雌激素受体(ER)/孕激素受体(PR)/人表皮生长因子受体2 （HER2）状态。分层随机化将数据分为70%训练组（n= 18438）和30%验证组（n= 6146）。开发了六种ML算法，强调随机森林（RF）。受试者工作特征（ROC）曲线分析[曲线下面积（AUC）、敏感性、特异性、阴性预测值（NPV）]评估疗效。SHapley加性解释（SHAP）框架确定了关键的BM预测因子。结果：我们分析了24,584例患者，确定了1,298例（5.26%）BM患者。与梯度增强机（GBM）的最大灵敏度（0.658,95% CI: 0.609-0.707）相比，Logistic回归（LR）提供了最高的特异性[0.897,95%可信区间（CI）： 0.889-0.905]。灵敏度为0.658，需要更好的算法或多模态方法进行病例识别。多层感知器神经网络（MLPNN）模型表现出优异的性能，最高的AUC为0.808 (95% CI: 0.798-0.818)，超过了LR和自适应增强（AdaBoost）模型，两者的AUC均为0.803 （95% CI: 0.793-0.813）。RF模型特别擅长于排除BM， NPV超过97%。SHAP分析确定肿瘤大小、分级、T/N分期、ER/PR/HER2状态和脑/肝/肺转移是危险分层的关键预测因素。决策曲线分析显示RF比美国癌症联合委员会（AJCC）分期系统更有效。结论：我们的ML模型显示了预测BC患者脑转移的潜力。它可以作为早期识别高危个体的临床辅助手段。然而，适度的敏感性需要改进，以便更好地检测病例。本研究支持将机器学习融入临床实践，推进个性化肿瘤医学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Gland surgery Medicine-Surgery

CiteScore

3.60

自引率

0.00%

发文量

113

期刊介绍： Gland Surgery (Gland Surg; GS, Print ISSN 2227-684X; Online ISSN 2227-8575) being indexed by PubMed/PubMed Central, is an open access, peer-review journal launched at May of 2012, published bio-monthly since February 2015.