{"title":"Regression analysis and validation of risk factors for upper limb dysfunction following modified radical mastectomy for breast cancer patients.","authors":"Yonggang Li, Shuan Hui","doi":"10.62347/CZYA6232","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and validate a predictive tool using machine learning models for identifying risk factors for upper limb dysfunction following modified radical mastectomy (MRM) in breast cancer patients.</p><p><strong>Methods: </strong>A total of 768 breast cancer patients who underwent Modified radical mastectomy (MRM) between January 2022 and December 2023 were included in this study. The dataset was divided into a training set (506 cases) and a validation set (262 cases). The collected data encompassed demographic characteristics, clinicopathological features, medical history, and postoperative rehabilitation plans. Predictive analyses were conducted using machine learning models, including support vector machine (SVM), extreme gradient boosting (XGBOOST), Gaussian naïve Bayes (GNB), adaptive boosting (ADABOOST), and random forest. Model evaluation was performed using ten-fold cross-validation, with performance metrics including receiver operating characteristic (ROC) curves, area under the curve (AUC) values, specificity, sensitivity, accuracy, and F1-score. DeLong's test was used to compare AUC values and identify the optimal predictive model.</p><p><strong>Results: </strong>Baseline characteristics showed no significant differences between the training and validation sets (P>0.05). Analysis of factors associated with upper limb dysfunction in the training set revealed significant differences in variables such as age, BMI, cancer type, axillary lymph node dissection, ipsilateral radiotherapy, postoperative rehabilitation plans, and monthly per capita household income (P<0.05). Low correlations were observed among these variables (R values close to 0), indicating minimal multicollinearity. Model performance evaluation showed that the XGBOOST and random forest models demonstrated high AUC values (0.817-0.884) across both the training and validation sets. These models also exhibited superior specificity and sensitivity, indicating strong predictive performance and robustness in identifying patients at risk of postoperative upper limb dysfunction.</p><p><strong>Conclusion: </strong>The XGBOOST and random forest models exhibited excellent predictive accuracy, offering valuable tools for the early identification and personalized management of high-risk patients. These models provide critical data support for postoperative rehabilitation planning and contribute to improving the quality of life for breast cancer patients.</p>","PeriodicalId":7731,"journal":{"name":"American journal of translational research","volume":"17 4","pages":"2614-2628"},"PeriodicalIF":1.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082522/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of translational research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.62347/CZYA6232","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop and validate a predictive tool using machine learning models for identifying risk factors for upper limb dysfunction following modified radical mastectomy (MRM) in breast cancer patients.
Methods: A total of 768 breast cancer patients who underwent Modified radical mastectomy (MRM) between January 2022 and December 2023 were included in this study. The dataset was divided into a training set (506 cases) and a validation set (262 cases). The collected data encompassed demographic characteristics, clinicopathological features, medical history, and postoperative rehabilitation plans. Predictive analyses were conducted using machine learning models, including support vector machine (SVM), extreme gradient boosting (XGBOOST), Gaussian naïve Bayes (GNB), adaptive boosting (ADABOOST), and random forest. Model evaluation was performed using ten-fold cross-validation, with performance metrics including receiver operating characteristic (ROC) curves, area under the curve (AUC) values, specificity, sensitivity, accuracy, and F1-score. DeLong's test was used to compare AUC values and identify the optimal predictive model.
Results: Baseline characteristics showed no significant differences between the training and validation sets (P>0.05). Analysis of factors associated with upper limb dysfunction in the training set revealed significant differences in variables such as age, BMI, cancer type, axillary lymph node dissection, ipsilateral radiotherapy, postoperative rehabilitation plans, and monthly per capita household income (P<0.05). Low correlations were observed among these variables (R values close to 0), indicating minimal multicollinearity. Model performance evaluation showed that the XGBOOST and random forest models demonstrated high AUC values (0.817-0.884) across both the training and validation sets. These models also exhibited superior specificity and sensitivity, indicating strong predictive performance and robustness in identifying patients at risk of postoperative upper limb dysfunction.
Conclusion: The XGBOOST and random forest models exhibited excellent predictive accuracy, offering valuable tools for the early identification and personalized management of high-risk patients. These models provide critical data support for postoperative rehabilitation planning and contribute to improving the quality of life for breast cancer patients.