Development of Explainable Machine Learning Models to Identify Patients at Risk for 1-Year Mortality and New Distant Metastases Postendoprosthetic Reconstruction for Lower Extremity Bone Tumors: A Secondary Analysis of the PARITY Trial.

IF 3.8 Q2 ORTHOPEDICS

JBJS Open Access Pub Date : 2025-05-22 eCollection Date: 2025-04-01 DOI:10.2106/JBJS.OA.24.00213

Jiawen Deng, Myron Moskalyk, Madhur Nayan, Ahmed Aoude, Michelle Ghert, Sahir Bhatnagar, Anthony Bozzo

{"title":"Development of Explainable Machine Learning Models to Identify Patients at Risk for 1-Year Mortality and New Distant Metastases Postendoprosthetic Reconstruction for Lower Extremity Bone Tumors: A Secondary Analysis of the PARITY Trial.","authors":"Jiawen Deng, Myron Moskalyk, Madhur Nayan, Ahmed Aoude, Michelle Ghert, Sahir Bhatnagar, Anthony Bozzo","doi":"10.2106/JBJS.OA.24.00213","DOIUrl":null,"url":null,"abstract":"Background: Accurate prediction of postoperative metastasis and mortality risks in patients undergoing lower-limb oncological resection and endoprosthetic reconstruction is essential for guiding adjuvant therapies and managing patient expectations. Current prediction methods are limited by variability in patient-specific factors. This study aims to develop and internally validate explainable machine learning (ML) models to predict the 1-year risk of new distant metastases and mortality in these patients.Methods: We performed a secondary analysis of data from the Prophylactic Antibiotic Regimens in Tumor Surgery trial, which included 604 patients. Candidate features were selected based on availability and clinical relevance and then narrowed using Least Absolute Shrinkage and Selection Operator (LASSO) regression and Boruta algorithms. Six ML classification algorithms were tuned and calibrated: logistic regression, support vector machines, random forest, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and neural networks. Models were developed with and without including percent tumor necrosis due to its high missing data rate (>30%). Hyperparameters were tuned using Bayesian optimization. Internal validation was conducted using a 30% hold-out set. Model explainability was assessed using permutation-based feature importance and SHapley Additive exPlanations.Results: LightGBM was identified as the best-performing algorithm for both outcomes. For 1-year mortality prediction without percent necrosis, LightGBM achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.78 (95% confidence interval [CI] 0.70-0.86) during cross-validation and 0.72 on internal validation. For distant metastasis prediction, the LightGBM model without percent necrosis achieved an AUC-ROC of 0.77 (95% CI 0.71-0.84) during cross-validation and 0.77 on internal validation. Including percent necrosis did not significantly improve model performance. The top predictors identified were patient age, largest tumor dimension, and tumor stage.Conclusions: Explainable ML models can effectively predict the 1-year risk of mortality and new distant metastases in patients undergoing lower-limb oncological resection and endoprosthetic reconstruction. Further external validation and consideration of other data modalities are required before integrating these ML-driven risk assessments into routine clinical practice.Level of evidence: Level II, Prognostic Study. See Instructions for Authors for a complete description of levels of evidence.","PeriodicalId":36492,"journal":{"name":"JBJS Open Access","volume":"10 2","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12080683/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JBJS Open Access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2106/JBJS.OA.24.00213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Accurate prediction of postoperative metastasis and mortality risks in patients undergoing lower-limb oncological resection and endoprosthetic reconstruction is essential for guiding adjuvant therapies and managing patient expectations. Current prediction methods are limited by variability in patient-specific factors. This study aims to develop and internally validate explainable machine learning (ML) models to predict the 1-year risk of new distant metastases and mortality in these patients.

Methods: We performed a secondary analysis of data from the Prophylactic Antibiotic Regimens in Tumor Surgery trial, which included 604 patients. Candidate features were selected based on availability and clinical relevance and then narrowed using Least Absolute Shrinkage and Selection Operator (LASSO) regression and Boruta algorithms. Six ML classification algorithms were tuned and calibrated: logistic regression, support vector machines, random forest, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and neural networks. Models were developed with and without including percent tumor necrosis due to its high missing data rate (>30%). Hyperparameters were tuned using Bayesian optimization. Internal validation was conducted using a 30% hold-out set. Model explainability was assessed using permutation-based feature importance and SHapley Additive exPlanations.

Results: LightGBM was identified as the best-performing algorithm for both outcomes. For 1-year mortality prediction without percent necrosis, LightGBM achieved an area under the receiver operating characteristic curve (AUC-ROC) of 0.78 (95% confidence interval [CI] 0.70-0.86) during cross-validation and 0.72 on internal validation. For distant metastasis prediction, the LightGBM model without percent necrosis achieved an AUC-ROC of 0.77 (95% CI 0.71-0.84) during cross-validation and 0.77 on internal validation. Including percent necrosis did not significantly improve model performance. The top predictors identified were patient age, largest tumor dimension, and tumor stage.

Conclusions: Explainable ML models can effectively predict the 1-year risk of mortality and new distant metastases in patients undergoing lower-limb oncological resection and endoprosthetic reconstruction. Further external validation and consideration of other data modalities are required before integrating these ML-driven risk assessments into routine clinical practice.

Level of evidence: Level II, Prognostic Study. See Instructions for Authors for a complete description of levels of evidence.

Abstract Image

查看原文本刊更多论文

开发可解释的机器学习模型，以识别下肢骨肿瘤假体重建后1年死亡率和新远处转移风险的患者：对平价试验的二次分析。

背景：准确预测下肢肿瘤切除和假体内重建患者术后转移和死亡风险对于指导辅助治疗和管理患者期望至关重要。目前的预测方法受限于患者特异性因素的可变性。本研究旨在开发和内部验证可解释的机器学习（ML）模型，以预测这些患者1年内新远处转移的风险和死亡率。方法：我们对肿瘤手术预防性抗生素方案试验的数据进行了二次分析，该试验包括604例患者。根据可用性和临床相关性选择候选特征，然后使用最小绝对收缩和选择算子（LASSO）回归和Boruta算法进行缩小。对六种ML分类算法进行了调整和校准：逻辑回归、支持向量机、随机森林、光梯度增强机（LightGBM）、极限梯度增强（XGBoost）和神经网络。由于数据缺失率高（bbb30 %），模型包括和不包括肿瘤坏死百分比。使用贝叶斯优化对超参数进行了调优。内部验证使用30%的保留集进行。使用基于排列的特征重要性和SHapley加性解释来评估模型的可解释性。结果：LightGBM被认为是两种结果中表现最好的算法。对于无坏死百分比的1年死亡率预测，LightGBM交叉验证的受试者工作特征曲线下面积（AUC-ROC）为0.78(95%可信区间[CI] 0.70-0.86)，内部验证为0.72。对于远处转移的预测，无坏死百分比的LightGBM模型在交叉验证时的AUC-ROC为0.77 (95% CI 0.71-0.84)，在内部验证时为0.77。加入坏死百分数对模型性能没有显著改善。确定的最重要预测因素是患者年龄、最大肿瘤尺寸和肿瘤分期。结论：可解释的ML模型可以有效预测下肢肿瘤切除和假体内重建患者1年的死亡率和新的远处转移风险。在将这些机器学习驱动的风险评估整合到常规临床实践之前，需要进一步的外部验证和考虑其他数据模式。证据等级：II级，预后研究。有关证据水平的完整描述，请参见作者说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊