Koen D Oude Nijhuis, Jasper Prijs, Britt Barvelink, Hans van Luit, Yang Zhao, Zhibin Liao, Ruurd L Jaarsma, Frank F A IJpma, Mathieu M E Wijffels, Job N Doornberg, Joost W Colaris
{"title":"Open-source convolutional neural network to classify distal radial fractures according to the AO/OTA classification on plain radiographs.","authors":"Koen D Oude Nijhuis, Jasper Prijs, Britt Barvelink, Hans van Luit, Yang Zhao, Zhibin Liao, Ruurd L Jaarsma, Frank F A IJpma, Mathieu M E Wijffels, Job N Doornberg, Joost W Colaris","doi":"10.1007/s00068-025-02931-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Convolutional Neural Networks (CNNs) have shown promise in fracture detection, but their ability to improve surgeons' inconsistent fracture classification remains unstudied. Therefore, our aim was create and (externally) validate the performance of an open-source CNN algorithm to classify DRFs according to the AO/OTA classification system?</p><p><strong>Methods: </strong>Patients with postero-anterior, lateral and oblique radiographs were included. Radiographs were classified according to the AO/OTA-classification and were used to train a CNN algorithm. The algorithm was tested on an internal and external validation set (two other level 1 trauma centers), with the DRFs classified by three independent surgeons.</p><p><strong>Results: </strong>659 radiographs were used to train the algorithm. Internal- and external validation sets contained 190 and 188 patients, respectively. Upon internal validation, the CNN had an accuracy of 62% and an area under receiving operating characteristic curve (AUC) of 0.63-0.93 (type 2R3A 0.84, type 2R3B 0.63, type 2R3C 0.75, and no DRF 0.93). On the external validation, the algorithm has an accuracy of 61% and an AUC of 0.56-0.88 (type 2R3A 0.82, type 2R3B 0.56, type 2R3C 0.75, and no DRF 0.88).</p><p><strong>Conclusion: </strong>The presented algorithm has demonstrated excellent accuracy in classifying type 2R3A DRFs and excluding DRFs. However, poor to moderate accuracy is observed in classifying 2R3B and 2R3C DRFs according to the AO/OTA system, similar to limited surgeons' inter-observer agreement. These results show that despite previous excellence in fracture detection, CNN-algorithms struggle with classifying; potentially showing the inherent problems with these classification systems.</p>","PeriodicalId":520620,"journal":{"name":"European journal of trauma and emergency surgery : official publication of the European Trauma Society","volume":"51 1","pages":"261"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12279608/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European journal of trauma and emergency surgery : official publication of the European Trauma Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00068-025-02931-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Convolutional Neural Networks (CNNs) have shown promise in fracture detection, but their ability to improve surgeons' inconsistent fracture classification remains unstudied. Therefore, our aim was create and (externally) validate the performance of an open-source CNN algorithm to classify DRFs according to the AO/OTA classification system?
Methods: Patients with postero-anterior, lateral and oblique radiographs were included. Radiographs were classified according to the AO/OTA-classification and were used to train a CNN algorithm. The algorithm was tested on an internal and external validation set (two other level 1 trauma centers), with the DRFs classified by three independent surgeons.
Results: 659 radiographs were used to train the algorithm. Internal- and external validation sets contained 190 and 188 patients, respectively. Upon internal validation, the CNN had an accuracy of 62% and an area under receiving operating characteristic curve (AUC) of 0.63-0.93 (type 2R3A 0.84, type 2R3B 0.63, type 2R3C 0.75, and no DRF 0.93). On the external validation, the algorithm has an accuracy of 61% and an AUC of 0.56-0.88 (type 2R3A 0.82, type 2R3B 0.56, type 2R3C 0.75, and no DRF 0.88).
Conclusion: The presented algorithm has demonstrated excellent accuracy in classifying type 2R3A DRFs and excluding DRFs. However, poor to moderate accuracy is observed in classifying 2R3B and 2R3C DRFs according to the AO/OTA system, similar to limited surgeons' inter-observer agreement. These results show that despite previous excellence in fracture detection, CNN-algorithms struggle with classifying; potentially showing the inherent problems with these classification systems.