Tongtong Huo, Pengran Liu, Mingdi Xue, Jiayao Zhang, Yi Xie, Honglin Wang, Hong Zhou, Zineng Yan, Songxiang Liu, Lin Lu, Jiaming Yang, Wei Wu, Zhewei Ye
{"title":"成人胫骨平台骨折的深度学习诊断:有外部验证的多中心研究。","authors":"Tongtong Huo, Pengran Liu, Mingdi Xue, Jiayao Zhang, Yi Xie, Honglin Wang, Hong Zhou, Zineng Yan, Songxiang Liu, Lin Lu, Jiaming Yang, Wei Wu, Zhewei Ye","doi":"10.1093/radadv/umaf020","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate a MobileNetV3-YOLOv8 deep learning (DL) model for detecting tibial plateau fractures (TPFs), including occult TPFs (OTPFs), on knee radiographs. We hypothesized that the DL model would improve diagnostic performance and reduce interpretation time, particularly for less experienced physicians.</p><p><strong>Materials and methods: </strong>This retrospective, multicenter study, included 1543 adult patients from 5 tertiary hospitals in China. A total of 3547 radiographs were included: 2837 for training/validation and 710 from a single external center for testing. In the test set, 267 (37.6%) were normal, 282 (39.7%) were obvious TPFs, and 161 (22.7%) were OTPFs. Performance metrics comprised sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1-score, and the area under the receiver operating characteristic curve (AUROC). Eleven physicians (6 experienced, 5 inexperienced) interpreted 70 test images with and without DL assistance. Interreader agreement (Fleiss' κ) and interpretation time were evaluated.</p><p><strong>Results: </strong>For obvious TPFs, the model achieved 89.4% sensitivity (95% confidence interval [CI], 85.7-92.3), 92.5% specificity (95% CI, 89.9-95.1), 88.7% PPV, 92.9% NPV, 89.0% F1-score, and 91.9% accuracy (95% CI, 89.7-94.1). For OTPFs, it achieved 85.7% sensitivity (95% CI, 81.2-89.4), 91.3% specificity (95% CI, 88.5-93.2), 74.2% PPV, 95.6% NPV, 79.5% F1-score, and 88.2% accuracy (95% CI, 86.4-89.8). The overall AUROC was 0.949 (95% CI, 0.935-0.963). DL assistance improved OTPF sensitivity of less experienced readers (67.5% to 83.8%), increased interreader agreement (κ) (0.58 [95% CI, 0.52-0.64] to 0.71 [95% CI, 0.65-0.76] and reduced mean interpretation time (55.8 seconds to 34.3 seconds).</p><p><strong>Conclusion: </strong>The MobileNetV3-YOLOv8 model accurately detected both obvious and occult TPFs, substantially improving diagnostic sensitivity, interreader agreement, and efficiency. These findings suggest that AI assistance can enhance diagnostic performance and reduce interpretation time, offering considerable benefits for emergency departments where rapid and accurate fracture detection is paramount.</p>","PeriodicalId":519940,"journal":{"name":"Radiology advances","volume":"2 3","pages":"umaf020"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12429206/pdf/","citationCount":"0","resultStr":"{\"title\":\"Deep learning diagnosis of adult tibial plateau fractures: multicenter study with external validation.\",\"authors\":\"Tongtong Huo, Pengran Liu, Mingdi Xue, Jiayao Zhang, Yi Xie, Honglin Wang, Hong Zhou, Zineng Yan, Songxiang Liu, Lin Lu, Jiaming Yang, Wei Wu, Zhewei Ye\",\"doi\":\"10.1093/radadv/umaf020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To evaluate a MobileNetV3-YOLOv8 deep learning (DL) model for detecting tibial plateau fractures (TPFs), including occult TPFs (OTPFs), on knee radiographs. We hypothesized that the DL model would improve diagnostic performance and reduce interpretation time, particularly for less experienced physicians.</p><p><strong>Materials and methods: </strong>This retrospective, multicenter study, included 1543 adult patients from 5 tertiary hospitals in China. A total of 3547 radiographs were included: 2837 for training/validation and 710 from a single external center for testing. In the test set, 267 (37.6%) were normal, 282 (39.7%) were obvious TPFs, and 161 (22.7%) were OTPFs. Performance metrics comprised sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1-score, and the area under the receiver operating characteristic curve (AUROC). Eleven physicians (6 experienced, 5 inexperienced) interpreted 70 test images with and without DL assistance. Interreader agreement (Fleiss' κ) and interpretation time were evaluated.</p><p><strong>Results: </strong>For obvious TPFs, the model achieved 89.4% sensitivity (95% confidence interval [CI], 85.7-92.3), 92.5% specificity (95% CI, 89.9-95.1), 88.7% PPV, 92.9% NPV, 89.0% F1-score, and 91.9% accuracy (95% CI, 89.7-94.1). For OTPFs, it achieved 85.7% sensitivity (95% CI, 81.2-89.4), 91.3% specificity (95% CI, 88.5-93.2), 74.2% PPV, 95.6% NPV, 79.5% F1-score, and 88.2% accuracy (95% CI, 86.4-89.8). The overall AUROC was 0.949 (95% CI, 0.935-0.963). DL assistance improved OTPF sensitivity of less experienced readers (67.5% to 83.8%), increased interreader agreement (κ) (0.58 [95% CI, 0.52-0.64] to 0.71 [95% CI, 0.65-0.76] and reduced mean interpretation time (55.8 seconds to 34.3 seconds).</p><p><strong>Conclusion: </strong>The MobileNetV3-YOLOv8 model accurately detected both obvious and occult TPFs, substantially improving diagnostic sensitivity, interreader agreement, and efficiency. These findings suggest that AI assistance can enhance diagnostic performance and reduce interpretation time, offering considerable benefits for emergency departments where rapid and accurate fracture detection is paramount.</p>\",\"PeriodicalId\":519940,\"journal\":{\"name\":\"Radiology advances\",\"volume\":\"2 3\",\"pages\":\"umaf020\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12429206/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/radadv/umaf020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/radadv/umaf020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
Deep learning diagnosis of adult tibial plateau fractures: multicenter study with external validation.
Purpose: To evaluate a MobileNetV3-YOLOv8 deep learning (DL) model for detecting tibial plateau fractures (TPFs), including occult TPFs (OTPFs), on knee radiographs. We hypothesized that the DL model would improve diagnostic performance and reduce interpretation time, particularly for less experienced physicians.
Materials and methods: This retrospective, multicenter study, included 1543 adult patients from 5 tertiary hospitals in China. A total of 3547 radiographs were included: 2837 for training/validation and 710 from a single external center for testing. In the test set, 267 (37.6%) were normal, 282 (39.7%) were obvious TPFs, and 161 (22.7%) were OTPFs. Performance metrics comprised sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1-score, and the area under the receiver operating characteristic curve (AUROC). Eleven physicians (6 experienced, 5 inexperienced) interpreted 70 test images with and without DL assistance. Interreader agreement (Fleiss' κ) and interpretation time were evaluated.
Results: For obvious TPFs, the model achieved 89.4% sensitivity (95% confidence interval [CI], 85.7-92.3), 92.5% specificity (95% CI, 89.9-95.1), 88.7% PPV, 92.9% NPV, 89.0% F1-score, and 91.9% accuracy (95% CI, 89.7-94.1). For OTPFs, it achieved 85.7% sensitivity (95% CI, 81.2-89.4), 91.3% specificity (95% CI, 88.5-93.2), 74.2% PPV, 95.6% NPV, 79.5% F1-score, and 88.2% accuracy (95% CI, 86.4-89.8). The overall AUROC was 0.949 (95% CI, 0.935-0.963). DL assistance improved OTPF sensitivity of less experienced readers (67.5% to 83.8%), increased interreader agreement (κ) (0.58 [95% CI, 0.52-0.64] to 0.71 [95% CI, 0.65-0.76] and reduced mean interpretation time (55.8 seconds to 34.3 seconds).
Conclusion: The MobileNetV3-YOLOv8 model accurately detected both obvious and occult TPFs, substantially improving diagnostic sensitivity, interreader agreement, and efficiency. These findings suggest that AI assistance can enhance diagnostic performance and reduce interpretation time, offering considerable benefits for emergency departments where rapid and accurate fracture detection is paramount.