Evaluation of commercial AI algorithms for the detection of fractures, effusions, and dislocations on real-world clinical data: A prospective registry study
IF 2.8 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
I. Luiken , T. Lemke , A. Komenda , A.W. Marka , S.H. Kim , M.M. Graf , S. Ziegelmayer , D. Weller , C.J. Mertens , K.K. Bressem , M.R. Makowski , L.C. Adams , P. Prucker , F. Busch
{"title":"Evaluation of commercial AI algorithms for the detection of fractures, effusions, and dislocations on real-world clinical data: A prospective registry study","authors":"I. Luiken , T. Lemke , A. Komenda , A.W. Marka , S.H. Kim , M.M. Graf , S. Ziegelmayer , D. Weller , C.J. Mertens , K.K. Bressem , M.R. Makowski , L.C. Adams , P. Prucker , F. Busch","doi":"10.1016/j.radi.2025.103189","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To prospectively evaluate and directly compare the performance of three commercial AI algorithms (Gleamer, AZmed, and Radiobotics) for detecting fractures, dislocations, and joint effusions across multiple anatomical regions in real-world adult clinical radiography.</div></div><div><h3>Material and methods</h3><div>In this single-center, prospective technical performance evaluation study, we assessed these algorithms on radiographs from adult patients (n = 1037; 2926 radiographs; 22 anatomical regions) at the Technical University of Munich (January–March 2025). Radiologists’ reports served as the reference standard, with CT adjudication when available. Sensitivity, specificity, accuracy, and AUC were calculated; AUCs were compared using Bonferroni-corrected DeLong tests.</div></div><div><h3>Results</h3><div>Fractures were identified in 29.60 % of patients; 13.69 % had acute fractures and 6.65 % had multiple fractures. For all fractures, Gleamer (AUC 83.95 %, sensitivity 75.57 %, specificity 92.33 %) and AZmed (AUC 84.88 %, sensitivity 79.48 %, specificity 90.27 %) outperformed Radiobotics (AUC 77.24 %, sensitivity 60.91 %, specificity 93.56 %). For acute fractures, AUCs were comparable (range: 84.81–87.78 %). For multiple fractures, performance was limited (AUCs 64.17–73.40 %). AZmed had higher AUC for dislocation (61.85 % vs. 54.48 % for Gleamer), while Gleamer and Radiobotics outperformed AZmed for effusion (AUC 69.59 % and 73.63 % vs. 57.99 %). No algorithm exceeded 91 % accuracy for acute fractures.</div></div><div><h3>Conclusion</h3><div>In this real-world, single-center study, commercial AI algorithms showed moderate to high performance for straightforward fracture detection but limited accuracy for complex scenarios such as multiple fractures and dislocations.</div></div><div><h3>Implications for practice</h3><div>Current tools should be used as adjuncts rather than replacements for radiologists and reporting radiographers. Multicenter validation and more diverse training data are necessary to improve generalizability and robustness.</div></div>","PeriodicalId":47416,"journal":{"name":"Radiography","volume":"31 6","pages":"Article 103189"},"PeriodicalIF":2.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiography","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1078817425003335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
To prospectively evaluate and directly compare the performance of three commercial AI algorithms (Gleamer, AZmed, and Radiobotics) for detecting fractures, dislocations, and joint effusions across multiple anatomical regions in real-world adult clinical radiography.
Material and methods
In this single-center, prospective technical performance evaluation study, we assessed these algorithms on radiographs from adult patients (n = 1037; 2926 radiographs; 22 anatomical regions) at the Technical University of Munich (January–March 2025). Radiologists’ reports served as the reference standard, with CT adjudication when available. Sensitivity, specificity, accuracy, and AUC were calculated; AUCs were compared using Bonferroni-corrected DeLong tests.
Results
Fractures were identified in 29.60 % of patients; 13.69 % had acute fractures and 6.65 % had multiple fractures. For all fractures, Gleamer (AUC 83.95 %, sensitivity 75.57 %, specificity 92.33 %) and AZmed (AUC 84.88 %, sensitivity 79.48 %, specificity 90.27 %) outperformed Radiobotics (AUC 77.24 %, sensitivity 60.91 %, specificity 93.56 %). For acute fractures, AUCs were comparable (range: 84.81–87.78 %). For multiple fractures, performance was limited (AUCs 64.17–73.40 %). AZmed had higher AUC for dislocation (61.85 % vs. 54.48 % for Gleamer), while Gleamer and Radiobotics outperformed AZmed for effusion (AUC 69.59 % and 73.63 % vs. 57.99 %). No algorithm exceeded 91 % accuracy for acute fractures.
Conclusion
In this real-world, single-center study, commercial AI algorithms showed moderate to high performance for straightforward fracture detection but limited accuracy for complex scenarios such as multiple fractures and dislocations.
Implications for practice
Current tools should be used as adjuncts rather than replacements for radiologists and reporting radiographers. Multicenter validation and more diverse training data are necessary to improve generalizability and robustness.
RadiographyRADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
4.70
自引率
34.60%
发文量
169
审稿时长
63 days
期刊介绍:
Radiography is an International, English language, peer-reviewed journal of diagnostic imaging and radiation therapy. Radiography is the official professional journal of the College of Radiographers and is published quarterly. Radiography aims to publish the highest quality material, both clinical and scientific, on all aspects of diagnostic imaging and radiation therapy and oncology.