Evaluation of commercial AI algorithms for the detection of fractures, effusions, and dislocations on real-world clinical data: A prospective registry study

IF 2.8 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Radiography Pub Date : 2025-10-01 DOI:10.1016/j.radi.2025.103189

I. Luiken , T. Lemke , A. Komenda , A.W. Marka , S.H. Kim , M.M. Graf , S. Ziegelmayer , D. Weller , C.J. Mertens , K.K. Bressem , M.R. Makowski , L.C. Adams , P. Prucker , F. Busch

{"title":"Evaluation of commercial AI algorithms for the detection of fractures, effusions, and dislocations on real-world clinical data: A prospective registry study","authors":"I. Luiken , T. Lemke , A. Komenda , A.W. Marka , S.H. Kim , M.M. Graf , S. Ziegelmayer , D. Weller , C.J. Mertens , K.K. Bressem , M.R. Makowski , L.C. Adams , P. Prucker , F. Busch","doi":"10.1016/j.radi.2025.103189","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To prospectively evaluate and directly compare the performance of three commercial AI algorithms (Gleamer, AZmed, and Radiobotics) for detecting fractures, dislocations, and joint effusions across multiple anatomical regions in real-world adult clinical radiography.</div></div><div><h3>Material and methods</h3><div>In this single-center, prospective technical performance evaluation study, we assessed these algorithms on radiographs from adult patients (n = 1037; 2926 radiographs; 22 anatomical regions) at the Technical University of Munich (January–March 2025). Radiologists’ reports served as the reference standard, with CT adjudication when available. Sensitivity, specificity, accuracy, and AUC were calculated; AUCs were compared using Bonferroni-corrected DeLong tests.</div></div><div><h3>Results</h3><div>Fractures were identified in 29.60 % of patients; 13.69 % had acute fractures and 6.65 % had multiple fractures. For all fractures, Gleamer (AUC 83.95 %, sensitivity 75.57 %, specificity 92.33 %) and AZmed (AUC 84.88 %, sensitivity 79.48 %, specificity 90.27 %) outperformed Radiobotics (AUC 77.24 %, sensitivity 60.91 %, specificity 93.56 %). For acute fractures, AUCs were comparable (range: 84.81–87.78 %). For multiple fractures, performance was limited (AUCs 64.17–73.40 %). AZmed had higher AUC for dislocation (61.85 % vs. 54.48 % for Gleamer), while Gleamer and Radiobotics outperformed AZmed for effusion (AUC 69.59 % and 73.63 % vs. 57.99 %). No algorithm exceeded 91 % accuracy for acute fractures.</div></div><div><h3>Conclusion</h3><div>In this real-world, single-center study, commercial AI algorithms showed moderate to high performance for straightforward fracture detection but limited accuracy for complex scenarios such as multiple fractures and dislocations.</div></div><div><h3>Implications for practice</h3><div>Current tools should be used as adjuncts rather than replacements for radiologists and reporting radiographers. Multicenter validation and more diverse training data are necessary to improve generalizability and robustness.</div></div>","PeriodicalId":47416,"journal":{"name":"Radiography","volume":"31 6","pages":"Article 103189"},"PeriodicalIF":2.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiography","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1078817425003335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

To prospectively evaluate and directly compare the performance of three commercial AI algorithms (Gleamer, AZmed, and Radiobotics) for detecting fractures, dislocations, and joint effusions across multiple anatomical regions in real-world adult clinical radiography.

Material and methods

In this single-center, prospective technical performance evaluation study, we assessed these algorithms on radiographs from adult patients (n = 1037; 2926 radiographs; 22 anatomical regions) at the Technical University of Munich (January–March 2025). Radiologists’ reports served as the reference standard, with CT adjudication when available. Sensitivity, specificity, accuracy, and AUC were calculated; AUCs were compared using Bonferroni-corrected DeLong tests.

Results

Fractures were identified in 29.60 % of patients; 13.69 % had acute fractures and 6.65 % had multiple fractures. For all fractures, Gleamer (AUC 83.95 %, sensitivity 75.57 %, specificity 92.33 %) and AZmed (AUC 84.88 %, sensitivity 79.48 %, specificity 90.27 %) outperformed Radiobotics (AUC 77.24 %, sensitivity 60.91 %, specificity 93.56 %). For acute fractures, AUCs were comparable (range: 84.81–87.78 %). For multiple fractures, performance was limited (AUCs 64.17–73.40 %). AZmed had higher AUC for dislocation (61.85 % vs. 54.48 % for Gleamer), while Gleamer and Radiobotics outperformed AZmed for effusion (AUC 69.59 % and 73.63 % vs. 57.99 %). No algorithm exceeded 91 % accuracy for acute fractures.

Conclusion

In this real-world, single-center study, commercial AI algorithms showed moderate to high performance for straightforward fracture detection but limited accuracy for complex scenarios such as multiple fractures and dislocations.

Implications for practice

Current tools should be used as adjuncts rather than replacements for radiologists and reporting radiographers. Multicenter validation and more diverse training data are necessary to improve generalizability and robustness.

查看原文本刊更多论文

评估商业人工智能算法在真实世界临床数据中检测骨折、积液和脱位：一项前瞻性注册研究。

目的：前瞻性评估和直接比较三种商用人工智能算法（Gleamer、AZmed和radiobots）在真实成人临床x线摄影中用于检测多个解剖区域的骨折、脱位和关节积液的性能。材料和方法：在这项单中心前瞻性技术性能评价研究中，我们在[匿名]（2025年1月至3月）对成人患者的x线片（n = 1037； 2926张x线片；22个解剖区域）评估了这些算法。放射科医生的报告作为参考标准，当有CT裁决时。计算灵敏度、特异度、准确度和AUC；采用bonferroni校正的DeLong试验比较auc。结果：骨折确诊率为29.60%；13.69%为急性骨折，6.65%为多发骨折。对于所有骨折，Gleamer （AUC 83.95%，敏感性75.57%，特异性92.33%）和AZmed （AUC 84.88%，敏感性79.48%，特异性90.27%）优于Radiobotics （AUC 77.24%，敏感性60.91%，特异性93.56%）。对于急性骨折，auc具有可比性（范围：84.81- 87.78%）。对于多发骨折，表现有限（auc为64.17- 73.40%）。AZmed治疗脱位的AUC更高（61.85% vs. 54.48%），而Gleamer和Radiobotics治疗积液的AUC优于AZmed （69.59% vs. 73.63% vs. 57.99%）。对于急性骨折，没有一种算法的准确率超过91%。结论：在这个现实世界的单中心研究中，商用AI算法在直接的骨折检测中表现出中等到高性能，但在复杂的情况下，如多发骨折和脱位，准确性有限。对实践的启示：当前的工具应该作为辅助而不是替代放射科医生和报告放射技师。多中心验证和更多样化的训练数据是提高泛化性和鲁棒性的必要条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radiography RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.70

自引率

34.60%

发文量

169

审稿时长

63 days

期刊介绍： Radiography is an International, English language, peer-reviewed journal of diagnostic imaging and radiation therapy. Radiography is the official professional journal of the College of Radiographers and is published quarterly. Radiography aims to publish the highest quality material, both clinical and scientific, on all aspects of diagnostic imaging and radiation therapy and oncology.