Reinier W A Spek, William J Smith, Marat Sverdlov, Sebastiaan Broos, Yang Zhao, Zhibin Liao, Johan W Verjans, Jasper Prijs, Minh-Son To, Henrik Åberg, Wael Chiri, Frank F A IJpma, Bhavin Jadav, John White, Gregory I Bain, Paul C Jutte, Michel P J van den Bekerom, Ruurd L Jaarsma, Job N Doornberg, Soheil Ashkani, Nick Assink, Joost W Colaris, Nynke V der Gaast, Prakash Jayakumar, Laura J Kim, Huub H de Klerk, Joost Kuipers, Wouter H Mallee, Anne M L Meesters, Stijn R J Mennes, Miriam G E Oldhof, Peter A J Pijpker, Ching Yiu Lau, Mathieu M E Wijffels, Arno D Wolf
{"title":"Detection, classification, and characterization of proximal humerus fractures on plain radiographs.","authors":"Reinier W A Spek, William J Smith, Marat Sverdlov, Sebastiaan Broos, Yang Zhao, Zhibin Liao, Johan W Verjans, Jasper Prijs, Minh-Son To, Henrik Åberg, Wael Chiri, Frank F A IJpma, Bhavin Jadav, John White, Gregory I Bain, Paul C Jutte, Michel P J van den Bekerom, Ruurd L Jaarsma, Job N Doornberg, Soheil Ashkani, Nick Assink, Joost W Colaris, Nynke V der Gaast, Prakash Jayakumar, Laura J Kim, Huub H de Klerk, Joost Kuipers, Wouter H Mallee, Anne M L Meesters, Stijn R J Mennes, Miriam G E Oldhof, Peter A J Pijpker, Ching Yiu Lau, Mathieu M E Wijffels, Arno D Wolf","doi":"10.1302/0301-620X.106B11.BJJ-2024-0264.R1","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>The purpose of this study was to develop a convolutional neural network (CNN) for fracture detection, classification, and identification of greater tuberosity displacement ≥ 1 cm, neck-shaft angle (NSA) ≤ 100°, shaft translation, and articular fracture involvement, on plain radiographs.</p><p><strong>Methods: </strong>The CNN was trained and tested on radiographs sourced from 11 hospitals in Australia and externally validated on radiographs from the Netherlands. Each radiograph was paired with corresponding CT scans to serve as the reference standard based on dual independent evaluation by trained researchers and attending orthopaedic surgeons. Presence of a fracture, classification (non- to minimally displaced; two-part, multipart, and glenohumeral dislocation), and four characteristics were determined on 2D and 3D CT scans and subsequently allocated to each series of radiographs. Fracture characteristics included greater tuberosity displacement ≥ 1 cm, NSA ≤ 100°, shaft translation (0% to < 75%, 75% to 95%, > 95%), and the extent of articular involvement (0% to < 15%, 15% to 35%, or > 35%).</p><p><strong>Results: </strong>For detection and classification, the algorithm was trained on 1,709 radiographs (n = 803), tested on 567 radiographs (n = 244), and subsequently externally validated on 535 radiographs (n = 227). For characterization, healthy shoulders and glenohumeral dislocation were excluded. The overall accuracy for fracture detection was 94% (area under the receiver operating characteristic curve (AUC) = 0.98) and for classification 78% (AUC 0.68 to 0.93). Accuracy to detect greater tuberosity fracture displacement ≥ 1 cm was 35.0% (AUC 0.57). The CNN did not recognize NSAs ≤ 100° (AUC 0.42), nor fractures with ≥ 75% shaft translation (AUC 0.51 to 0.53), or with ≥ 15% articular involvement (AUC 0.48 to 0.49). For all objectives, the model's performance on the external dataset showed similar accuracy levels.</p><p><strong>Conclusion: </strong>CNNs proficiently rule out proximal humerus fractures on plain radiographs. Despite rigorous training methodology based on CT imaging with multi-rater consensus to serve as the reference standard, artificial intelligence-driven classification is insufficient for clinical implementation. The CNN exhibited poor diagnostic ability to detect greater tuberosity displacement ≥ 1 cm and failed to identify NSAs ≤ 100°, shaft translations, or articular fractures.</p>","PeriodicalId":48944,"journal":{"name":"Bone & Joint Journal","volume":"106-B 11","pages":"1348-1360"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bone & Joint Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1302/0301-620X.106B11.BJJ-2024-0264.R1","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: The purpose of this study was to develop a convolutional neural network (CNN) for fracture detection, classification, and identification of greater tuberosity displacement ≥ 1 cm, neck-shaft angle (NSA) ≤ 100°, shaft translation, and articular fracture involvement, on plain radiographs.
Methods: The CNN was trained and tested on radiographs sourced from 11 hospitals in Australia and externally validated on radiographs from the Netherlands. Each radiograph was paired with corresponding CT scans to serve as the reference standard based on dual independent evaluation by trained researchers and attending orthopaedic surgeons. Presence of a fracture, classification (non- to minimally displaced; two-part, multipart, and glenohumeral dislocation), and four characteristics were determined on 2D and 3D CT scans and subsequently allocated to each series of radiographs. Fracture characteristics included greater tuberosity displacement ≥ 1 cm, NSA ≤ 100°, shaft translation (0% to < 75%, 75% to 95%, > 95%), and the extent of articular involvement (0% to < 15%, 15% to 35%, or > 35%).
Results: For detection and classification, the algorithm was trained on 1,709 radiographs (n = 803), tested on 567 radiographs (n = 244), and subsequently externally validated on 535 radiographs (n = 227). For characterization, healthy shoulders and glenohumeral dislocation were excluded. The overall accuracy for fracture detection was 94% (area under the receiver operating characteristic curve (AUC) = 0.98) and for classification 78% (AUC 0.68 to 0.93). Accuracy to detect greater tuberosity fracture displacement ≥ 1 cm was 35.0% (AUC 0.57). The CNN did not recognize NSAs ≤ 100° (AUC 0.42), nor fractures with ≥ 75% shaft translation (AUC 0.51 to 0.53), or with ≥ 15% articular involvement (AUC 0.48 to 0.49). For all objectives, the model's performance on the external dataset showed similar accuracy levels.
Conclusion: CNNs proficiently rule out proximal humerus fractures on plain radiographs. Despite rigorous training methodology based on CT imaging with multi-rater consensus to serve as the reference standard, artificial intelligence-driven classification is insufficient for clinical implementation. The CNN exhibited poor diagnostic ability to detect greater tuberosity displacement ≥ 1 cm and failed to identify NSAs ≤ 100°, shaft translations, or articular fractures.
期刊介绍:
We welcome original articles from any part of the world. The papers are assessed by members of the Editorial Board and our international panel of expert reviewers, then either accepted for publication or rejected by the Editor. We receive over 2000 submissions each year and accept about 250 for publication, many after revisions recommended by the reviewers, editors or statistical advisers. A decision usually takes between six and eight weeks. Each paper is assessed by two reviewers with a special interest in the subject covered by the paper, and also by members of the editorial team. Controversial papers will be discussed at a full meeting of the Editorial Board. Publication is between four and six months after acceptance.