Benedikt Langenberger, Daniel Schrednitzki, Andreas Halder, Reinhard Busse, Christoph Pross
{"title":"Leveraging machine learning for duration of surgery prediction in knee and hip arthroplasty - a development and validation study.","authors":"Benedikt Langenberger, Daniel Schrednitzki, Andreas Halder, Reinhard Busse, Christoph Pross","doi":"10.1186/s12911-025-02927-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Duration of surgery (DOS) varies substantially for patients with hip and knee arthroplasty (HA/KA) and is a major risk factor for adverse events. We therefore aimed (1) to identify whether machine learning can predict DOS in HA/KA patients using retrospective data available before surgery with reasonable performance, (2) to compare whether machine learning is able to outperform multivariable regression in predictive performance and (3) to identify the most important predictor variables for DOS both in a multi- and single-hospital context.</p><p><strong>Methods: </strong>eXtreme Gradient Boosting (XGBoost) and multivariable linear regression were used for predictions. Both models were applied to both the whole dataset which included multiple hospitals (3,704 patients), and a single-hospital dataset (1,815 patients) of the hospital with the highest case-volumes of our sample. Data was split into training (75%) and test data (25%) for both datasets. Models were trained using 5-fold cross-validation (CV) on the training datasets and applied to test data for performance comparison.</p><p><strong>Results: </strong>On test data in the multi-hospital setting, the mean absolute error (MAE) was 12.13 min (HA) / 13.61 min (KA) for XGBoost. In the single-hospital analysis, performance on test data was MAE 10.87 min (HA) / MAE 12.53 min (KA) for XGBoost. Predictive ability of XGBoost was tended to be better than of regression in all setting, however not statistically significantly. Important predictors for XGBoost were physician experience, age, body mass index, patient reported outcome measures and, for the multi-hospital analysis, the hospital.</p><p><strong>Conclusion: </strong>Machine learning can predict DOS in both a multi-hospital and single-hospital setting with reasonable performance. Performance between regression and machine learning differed slightly, however insignificantly, while larger datasets may improve predictive performance. The study found that hospital indicators matter in the multi-hospital setting despite controlling for various variables, highlighting potential quality differences between hospitals.</p><p><strong>Trial registration: </strong>The study was registered at the German Clinical Trials Register (DRKS) under DRKS00019916.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"106"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11877953/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02927-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Duration of surgery (DOS) varies substantially for patients with hip and knee arthroplasty (HA/KA) and is a major risk factor for adverse events. We therefore aimed (1) to identify whether machine learning can predict DOS in HA/KA patients using retrospective data available before surgery with reasonable performance, (2) to compare whether machine learning is able to outperform multivariable regression in predictive performance and (3) to identify the most important predictor variables for DOS both in a multi- and single-hospital context.
Methods: eXtreme Gradient Boosting (XGBoost) and multivariable linear regression were used for predictions. Both models were applied to both the whole dataset which included multiple hospitals (3,704 patients), and a single-hospital dataset (1,815 patients) of the hospital with the highest case-volumes of our sample. Data was split into training (75%) and test data (25%) for both datasets. Models were trained using 5-fold cross-validation (CV) on the training datasets and applied to test data for performance comparison.
Results: On test data in the multi-hospital setting, the mean absolute error (MAE) was 12.13 min (HA) / 13.61 min (KA) for XGBoost. In the single-hospital analysis, performance on test data was MAE 10.87 min (HA) / MAE 12.53 min (KA) for XGBoost. Predictive ability of XGBoost was tended to be better than of regression in all setting, however not statistically significantly. Important predictors for XGBoost were physician experience, age, body mass index, patient reported outcome measures and, for the multi-hospital analysis, the hospital.
Conclusion: Machine learning can predict DOS in both a multi-hospital and single-hospital setting with reasonable performance. Performance between regression and machine learning differed slightly, however insignificantly, while larger datasets may improve predictive performance. The study found that hospital indicators matter in the multi-hospital setting despite controlling for various variables, highlighting potential quality differences between hospitals.
Trial registration: The study was registered at the German Clinical Trials Register (DRKS) under DRKS00019916.
背景:髋关节和膝关节置换术(HA/KA)患者的手术时间(DOS)差异很大,是不良事件的主要危险因素。因此,我们的目的是(1)确定机器学习是否可以使用术前可用的回顾性数据预测HA/KA患者的DOS,(2)比较机器学习是否能够在预测性能上优于多变量回归,(3)确定在多医院和单医院背景下DOS的最重要预测变量。方法:采用极限梯度增强(XGBoost)和多变量线性回归进行预测。这两种模型都应用于包括多家医院(3,704名患者)在内的整个数据集,以及我们样本中病例量最高的医院的单个医院数据集(1,815名患者)。两个数据集的数据被分成训练数据(75%)和测试数据(25%)。模型在训练数据集上使用5倍交叉验证(CV)进行训练,并应用于测试数据进行性能比较。结果:在多家医院的测试数据中,XGBoost的平均绝对误差(MAE)为12.13 min (HA) / 13.61 min (KA)。在单医院分析中,测试数据的性能为XGBoost的MAE 10.87 min (HA) / MAE 12.53 min (KA)。在所有情况下,XGBoost的预测能力都倾向于优于回归,但无统计学意义。XGBoost的重要预测因素是医生经验、年龄、体重指数、患者报告的结果测量,以及在多医院分析中,医院。结论:机器学习可以预测多医院和单医院的DOS,性能合理。回归和机器学习之间的性能略有不同,但差异不大,而更大的数据集可能会提高预测性能。研究发现,尽管控制了各种变量,但医院指标在多医院环境中仍然很重要,突出了医院之间潜在的质量差异。试验注册:该研究已在德国临床试验注册中心(DRKS)注册,编号为DRKS00019916。
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.