YOLOv12 Algorithm-Aided Detection and Classification of Lateral Malleolar Avulsion Fracture and Subfibular Ossicle Based on CT Images: Multicenter Study.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-10-03 DOI:10.2196/79064

Jiayi Liu, Peng Sun, Yousheng Yuan, Zihan Chen, Ke Tian, Qian Gao, Xiangsheng Li, Liang Xia, Jun Zhang, Nan Xu

{"title":"YOLOv12 Algorithm-Aided Detection and Classification of Lateral Malleolar Avulsion Fracture and Subfibular Ossicle Based on CT Images: Multicenter Study.","authors":"Jiayi Liu, Peng Sun, Yousheng Yuan, Zihan Chen, Ke Tian, Qian Gao, Xiangsheng Li, Liang Xia, Jun Zhang, Nan Xu","doi":"10.2196/79064","DOIUrl":null,"url":null,"abstract":"Background: Lateral malleolar avulsion fractures (LMAFs) and subfibular ossicles (SFOs) are distinct entities that both present as small bone fragments near the lateral malleolus in imaging but require different treatment strategies. Clinical and radiological differentiation is challenging, which can impede timely and precise management. Magnetic resonance imaging (MRI) is the diagnostic gold standard for differentiating LMAFs from SFOs, whereas radiological differentiation using computed tomography (CT) alone is challenging in routine practice. Deep convolutional neural networks (DCNNs) have shown promise in musculoskeletal imaging diagnostics, but robust, multicenter evidence in this specific context is lacking.Objective: This study aims to evaluate several state-of-the-art DCNNs-including the latest You Only Look Once (YOLO) v12 algorithm-for detecting and classifying LMAFs and SFOs in CT images, using MRI-based diagnoses as the gold standard and to compare model performance with radiologists reading CT alone.Methods: In this retrospective study, 1918 patients (LMAF: n=1253, 65.3%; SFO: n=665, 34.7%) were enrolled from 2 hospitals in China between 2014 and 2024. MRI served as the gold standard and was independently interpreted by 2 senior musculoskeletal radiologists. Only CT images were used for model training, validation, and testing. CT images were manually annotated with bounding boxes. The cohort was randomly split into a training set (n=1092, 56.93%), internal validation set (n=476, 24.82%), and external test set (n=350, 18.25%). Four deep learning models-faster R-CNN, single shot multibox detector (SSD), RetinaNet, and YOLOv12-were trained and evaluated using identical procedures. Model performance was assessed using mean average precision at intersection over union=0.5 (mAP50), area under the receiver operating curve (AUC), accuracy, sensitivity, and specificity. The external test set was also independently interpreted by 2 musculoskeletal radiologists with 7 and 15 years of experience, with results compared with the best-performing model. Saliency maps were generated using Shapley values to enhance interpretability.Results: Among the evaluated models, YOLOv12 achieved the highest detection and classification performance, with a mAP50 of 92.1% and an AUC of 0.983 on the external test set-significantly outperforming faster R-CNN (mAP50 63.7%; AUC 0.79); SSD (mAP50 63%; AUC 0.63); and RetinaNet (mAP50 67.0%; AUC 0.73)-all P<.001. When using CT alone, radiologists performed at a moderate level (accuracy: 75.6% and 69.1%; sensitivity: 75.0% and 65.2%; specificity: 76.0% and 71.1%), whereas YOLOv12 approached MRI-based reference performance (accuracy: 92.0%; sensitivity: 86.7%; specificity: 82.2%). Saliency maps corresponded well with expert-identified regions.Conclusions: While MRI (read by senior radiologists) is the gold standard for distinguishing LMAFs from SFOs, CT-based differentiation is challenging for radiologists. A CT-only DCNN (YOLOv12) achieved substantially higher performance than radiologists interpreting CT alone and approached the MRI-based reference standard, highlighting its potential to augment CT-based decision-making where MRI is limited or unavailable.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e79064"},"PeriodicalIF":3.8000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/79064","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Lateral malleolar avulsion fractures (LMAFs) and subfibular ossicles (SFOs) are distinct entities that both present as small bone fragments near the lateral malleolus in imaging but require different treatment strategies. Clinical and radiological differentiation is challenging, which can impede timely and precise management. Magnetic resonance imaging (MRI) is the diagnostic gold standard for differentiating LMAFs from SFOs, whereas radiological differentiation using computed tomography (CT) alone is challenging in routine practice. Deep convolutional neural networks (DCNNs) have shown promise in musculoskeletal imaging diagnostics, but robust, multicenter evidence in this specific context is lacking.

Objective: This study aims to evaluate several state-of-the-art DCNNs-including the latest You Only Look Once (YOLO) v12 algorithm-for detecting and classifying LMAFs and SFOs in CT images, using MRI-based diagnoses as the gold standard and to compare model performance with radiologists reading CT alone.

Methods: In this retrospective study, 1918 patients (LMAF: n=1253, 65.3%; SFO: n=665, 34.7%) were enrolled from 2 hospitals in China between 2014 and 2024. MRI served as the gold standard and was independently interpreted by 2 senior musculoskeletal radiologists. Only CT images were used for model training, validation, and testing. CT images were manually annotated with bounding boxes. The cohort was randomly split into a training set (n=1092, 56.93%), internal validation set (n=476, 24.82%), and external test set (n=350, 18.25%). Four deep learning models-faster R-CNN, single shot multibox detector (SSD), RetinaNet, and YOLOv12-were trained and evaluated using identical procedures. Model performance was assessed using mean average precision at intersection over union=0.5 (mAP50), area under the receiver operating curve (AUC), accuracy, sensitivity, and specificity. The external test set was also independently interpreted by 2 musculoskeletal radiologists with 7 and 15 years of experience, with results compared with the best-performing model. Saliency maps were generated using Shapley values to enhance interpretability.

Results: Among the evaluated models, YOLOv12 achieved the highest detection and classification performance, with a mAP50 of 92.1% and an AUC of 0.983 on the external test set-significantly outperforming faster R-CNN (mAP50 63.7%; AUC 0.79); SSD (mAP50 63%; AUC 0.63); and RetinaNet (mAP50 67.0%; AUC 0.73)-all P<.001. When using CT alone, radiologists performed at a moderate level (accuracy: 75.6% and 69.1%; sensitivity: 75.0% and 65.2%; specificity: 76.0% and 71.1%), whereas YOLOv12 approached MRI-based reference performance (accuracy: 92.0%; sensitivity: 86.7%; specificity: 82.2%). Saliency maps corresponded well with expert-identified regions.

Conclusions: While MRI (read by senior radiologists) is the gold standard for distinguishing LMAFs from SFOs, CT-based differentiation is challenging for radiologists. A CT-only DCNN (YOLOv12) achieved substantially higher performance than radiologists interpreting CT alone and approached the MRI-based reference standard, highlighting its potential to augment CT-based decision-making where MRI is limited or unavailable.

查看原文本刊更多论文

基于CT图像的YOLOv12算法辅助外踝撕脱骨折和腓骨下小骨的检测与分类：一项多中心研究。

背景：外踝撕脱骨折（LMAF）和腓骨下听骨（SFO）是不同的实体，在影像学上都表现为外踝附近的小骨碎片，但需要不同的治疗策略。临床和放射学鉴别具有挑战性，这可能妨碍及时和精确的治疗。在影像学方面，磁共振成像（MRI）是鉴别LMAF和SFO的诊断金标准，而在常规实践中，仅通过计算机断层扫描（CT）进行放射学鉴别是具有挑战性的。深度卷积神经网络（DCNNs）在肌肉骨骼成像诊断中显示出前景，但在这一特定背景下缺乏可靠的多中心证据。目的：评估几种最先进的dcnns（包括最新的YOLOv12算法）用于检测和分类CT图像上的LMAF和SFO，使用基于mri的诊断作为金标准，并将模型性能与放射科医生单独阅读CT进行比较。方法：在这项回顾性研究中，从2014年至2024年在中国两家医院招募了1918例患者（LMAF: 1253, SFO: 665）。核磁共振成像作为金标准，由两名高级肌肉骨骼放射科医生独立解释。仅使用CT图像进行模型训练、验证和测试。CT图像手工标注边界框。队列随机分为训练集（n= 1092）、内部验证集（n=476）和外部测试集（n=350）。四种深度学习模型——Faster R-CNN、SSD、RetinaNet和YOLOv12——使用相同的程序进行训练和评估。采用IoU=0.5 （mAP50）时的平均精度、受者操作曲线下面积（AUC）、准确性、灵敏度和特异性来评估模型的性能。外部测试集也由两名具有7年和15年经验的肌肉骨骼放射科医生独立解释，并将结果与表现最佳的模型进行比较。使用Shapley值生成显著性图以增强可解释性。结果：在评估的模型中，YOLOv12的检测和分类性能最高，在外部测试集上的mAP50为92.1%，AUC为0.983，显著优于Faster R-CNN (mAP50: 63.7%, AUC: 0.79), SSD （mAP50: 63.0%, AUC 0.63）和RetinaNet (mAP50: 67.0%, AUC: 0.73)(所有P均为 结论：虽然MRI（由高级放射科医生阅读）是区分LMAF和SFO的金标准，但基于ct的区分对于放射科医生来说是具有挑战性的。仅使用CT的DCNN （YOLOv12）取得了比放射科医生单独阅读CT更高的性能，并接近基于MRI的参考标准，突出了其在MRI有限或不可用的情况下增强基于CT的决策的潜力。临床试验:

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.