Ying Gao, Lei Liu, Shoujun Wang, Weijie Tao, Jinmiao Wang, Ran Duan, Hai Xie, Hideaki Takahashi, Jie Hao, Ming Gao
{"title":"SEER-based machine learning prediction of bone metastasis in breast cancer: model development and validation.","authors":"Ying Gao, Lei Liu, Shoujun Wang, Weijie Tao, Jinmiao Wang, Ran Duan, Hai Xie, Hideaki Takahashi, Jie Hao, Ming Gao","doi":"10.21037/gs-2025-168","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Breast cancer (BC) is the leading cancer in women. It often metastasizes to bone, worsening the prognosis. Diagnostic methods often fail to predict bone metastasis (BM). This study developed a machine learning (ML) model using the Surveillance, Epidemiology, and End Results (SEER) database for BM prediction, to refine treatments and improve outcomes.</p><p><strong>Methods: </strong>Using SEER data, we studied 24,584 BC patients diagnosed 2010-2015 with radiologically confirmed BM. Tumor size, grade, tumor (T)/node (N) stages, and estrogen receptor (ER)/progesterone receptor (PR)/human epidermal growth factor receptor 2 (HER2) status were assessed. Stratified randomization divided the data into 70% training (n=18,438) and 30% validation (n=6,146). Six ML algorithms were developed, emphasizing random forest (RF). Receiver operating characteristic (ROC) curve analysis [area under the curve (AUC), sensitivity, specificity, negative predictive value (NPV)] assessed performance. The SHapley Additive exPlanations (SHAP) framework identified key BM predictors.</p><p><strong>Results: </strong>Our analysis of 24,584 patients identified 1,298 (5.26%) patients with BM. Logistic regression (LR) provided the highest specificity [0.897, 95% confidence interval (CI): 0.889-0.905], contrasting with gradient boosting machine (GBM)'s maximal sensitivity (0.658, 95% CI: 0.609-0.707). With sensitivity at 0.658, better algorithms or multimodal methods are needed for case identification. The multilayer perceptron neural network (MLPNN) model demonstrated superior performance, with the highest AUC of 0.808 (95% CI: 0.798-0.818), surpassing the LR and adaptive boosting (AdaBoost) models, both with AUCs of 0.803 (95% CI: 0.793-0.813). The RF model was particularly adept at ruling out BM, with an NPV above 97%. The SHAP analysis identified tumor size, grade, T/N stages, ER/PR/HER2 status, and brain/liver/lung metastases as key predictors for risk stratification. Decision curve analysis showed RF's superior utility over the American Joint Committee on Cancer (AJCC) Staging System.</p><p><strong>Conclusions: </strong>Our ML model demonstrates potential for predicting BM in patients with BC. It may serve as a clinical aid to identify at-risk individuals early. However, moderate sensitivity requires refinement for better case detection. This study supports integrating ML into clinical practice, advancing personalized oncology medicine.</p>","PeriodicalId":12760,"journal":{"name":"Gland surgery","volume":"14 7","pages":"1366-1378"},"PeriodicalIF":1.6000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12322768/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gland surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/gs-2025-168","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/28 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Breast cancer (BC) is the leading cancer in women. It often metastasizes to bone, worsening the prognosis. Diagnostic methods often fail to predict bone metastasis (BM). This study developed a machine learning (ML) model using the Surveillance, Epidemiology, and End Results (SEER) database for BM prediction, to refine treatments and improve outcomes.
Methods: Using SEER data, we studied 24,584 BC patients diagnosed 2010-2015 with radiologically confirmed BM. Tumor size, grade, tumor (T)/node (N) stages, and estrogen receptor (ER)/progesterone receptor (PR)/human epidermal growth factor receptor 2 (HER2) status were assessed. Stratified randomization divided the data into 70% training (n=18,438) and 30% validation (n=6,146). Six ML algorithms were developed, emphasizing random forest (RF). Receiver operating characteristic (ROC) curve analysis [area under the curve (AUC), sensitivity, specificity, negative predictive value (NPV)] assessed performance. The SHapley Additive exPlanations (SHAP) framework identified key BM predictors.
Results: Our analysis of 24,584 patients identified 1,298 (5.26%) patients with BM. Logistic regression (LR) provided the highest specificity [0.897, 95% confidence interval (CI): 0.889-0.905], contrasting with gradient boosting machine (GBM)'s maximal sensitivity (0.658, 95% CI: 0.609-0.707). With sensitivity at 0.658, better algorithms or multimodal methods are needed for case identification. The multilayer perceptron neural network (MLPNN) model demonstrated superior performance, with the highest AUC of 0.808 (95% CI: 0.798-0.818), surpassing the LR and adaptive boosting (AdaBoost) models, both with AUCs of 0.803 (95% CI: 0.793-0.813). The RF model was particularly adept at ruling out BM, with an NPV above 97%. The SHAP analysis identified tumor size, grade, T/N stages, ER/PR/HER2 status, and brain/liver/lung metastases as key predictors for risk stratification. Decision curve analysis showed RF's superior utility over the American Joint Committee on Cancer (AJCC) Staging System.
Conclusions: Our ML model demonstrates potential for predicting BM in patients with BC. It may serve as a clinical aid to identify at-risk individuals early. However, moderate sensitivity requires refinement for better case detection. This study supports integrating ML into clinical practice, advancing personalized oncology medicine.
期刊介绍:
Gland Surgery (Gland Surg; GS, Print ISSN 2227-684X; Online ISSN 2227-8575) being indexed by PubMed/PubMed Central, is an open access, peer-review journal launched at May of 2012, published bio-monthly since February 2015.