成人胫骨平台骨折的深度学习诊断：有外部验证的多中心研究。

Radiology advances Pub Date : 2025-05-09 eCollection Date: 2025-05-01 DOI:10.1093/radadv/umaf020

Tongtong Huo, Pengran Liu, Mingdi Xue, Jiayao Zhang, Yi Xie, Honglin Wang, Hong Zhou, Zineng Yan, Songxiang Liu, Lin Lu, Jiaming Yang, Wei Wu, Zhewei Ye

{"title":"成人胫骨平台骨折的深度学习诊断：有外部验证的多中心研究。","authors":"Tongtong Huo, Pengran Liu, Mingdi Xue, Jiayao Zhang, Yi Xie, Honglin Wang, Hong Zhou, Zineng Yan, Songxiang Liu, Lin Lu, Jiaming Yang, Wei Wu, Zhewei Ye","doi":"10.1093/radadv/umaf020","DOIUrl":null,"url":null,"abstract":"Purpose: To evaluate a MobileNetV3-YOLOv8 deep learning (DL) model for detecting tibial plateau fractures (TPFs), including occult TPFs (OTPFs), on knee radiographs. We hypothesized that the DL model would improve diagnostic performance and reduce interpretation time, particularly for less experienced physicians.Materials and methods: This retrospective, multicenter study, included 1543 adult patients from 5 tertiary hospitals in China. A total of 3547 radiographs were included: 2837 for training/validation and 710 from a single external center for testing. In the test set, 267 (37.6%) were normal, 282 (39.7%) were obvious TPFs, and 161 (22.7%) were OTPFs. Performance metrics comprised sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1-score, and the area under the receiver operating characteristic curve (AUROC). Eleven physicians (6 experienced, 5 inexperienced) interpreted 70 test images with and without DL assistance. Interreader agreement (Fleiss' κ) and interpretation time were evaluated.Results: For obvious TPFs, the model achieved 89.4% sensitivity (95% confidence interval [CI], 85.7-92.3), 92.5% specificity (95% CI, 89.9-95.1), 88.7% PPV, 92.9% NPV, 89.0% F1-score, and 91.9% accuracy (95% CI, 89.7-94.1). For OTPFs, it achieved 85.7% sensitivity (95% CI, 81.2-89.4), 91.3% specificity (95% CI, 88.5-93.2), 74.2% PPV, 95.6% NPV, 79.5% F1-score, and 88.2% accuracy (95% CI, 86.4-89.8). The overall AUROC was 0.949 (95% CI, 0.935-0.963). DL assistance improved OTPF sensitivity of less experienced readers (67.5% to 83.8%), increased interreader agreement (κ) (0.58 [95% CI, 0.52-0.64] to 0.71 [95% CI, 0.65-0.76] and reduced mean interpretation time (55.8 seconds to 34.3 seconds).Conclusion: The MobileNetV3-YOLOv8 model accurately detected both obvious and occult TPFs, substantially improving diagnostic sensitivity, interreader agreement, and efficiency. These findings suggest that AI assistance can enhance diagnostic performance and reduce interpretation time, offering considerable benefits for emergency departments where rapid and accurate fracture detection is paramount.","PeriodicalId":519940,"journal":{"name":"Radiology advances","volume":"2 3","pages":"umaf020"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12429206/pdf/","citationCount":"0","resultStr":"{\"title\":\"Deep learning diagnosis of adult tibial plateau fractures: multicenter study with external validation.\",\"authors\":\"Tongtong Huo, Pengran Liu, Mingdi Xue, Jiayao Zhang, Yi Xie, Honglin Wang, Hong Zhou, Zineng Yan, Songxiang Liu, Lin Lu, Jiaming Yang, Wei Wu, Zhewei Ye\",\"doi\":\"10.1093/radadv/umaf020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To evaluate a MobileNetV3-YOLOv8 deep learning (DL) model for detecting tibial plateau fractures (TPFs), including occult TPFs (OTPFs), on knee radiographs. We hypothesized that the DL model would improve diagnostic performance and reduce interpretation time, particularly for less experienced physicians.Materials and methods: This retrospective, multicenter study, included 1543 adult patients from 5 tertiary hospitals in China. A total of 3547 radiographs were included: 2837 for training/validation and 710 from a single external center for testing. In the test set, 267 (37.6%) were normal, 282 (39.7%) were obvious TPFs, and 161 (22.7%) were OTPFs. Performance metrics comprised sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1-score, and the area under the receiver operating characteristic curve (AUROC). Eleven physicians (6 experienced, 5 inexperienced) interpreted 70 test images with and without DL assistance. Interreader agreement (Fleiss' κ) and interpretation time were evaluated.Results: For obvious TPFs, the model achieved 89.4% sensitivity (95% confidence interval [CI], 85.7-92.3), 92.5% specificity (95% CI, 89.9-95.1), 88.7% PPV, 92.9% NPV, 89.0% F1-score, and 91.9% accuracy (95% CI, 89.7-94.1). For OTPFs, it achieved 85.7% sensitivity (95% CI, 81.2-89.4), 91.3% specificity (95% CI, 88.5-93.2), 74.2% PPV, 95.6% NPV, 79.5% F1-score, and 88.2% accuracy (95% CI, 86.4-89.8). The overall AUROC was 0.949 (95% CI, 0.935-0.963). DL assistance improved OTPF sensitivity of less experienced readers (67.5% to 83.8%), increased interreader agreement (κ) (0.58 [95% CI, 0.52-0.64] to 0.71 [95% CI, 0.65-0.76] and reduced mean interpretation time (55.8 seconds to 34.3 seconds).Conclusion: The MobileNetV3-YOLOv8 model accurately detected both obvious and occult TPFs, substantially improving diagnostic sensitivity, interreader agreement, and efficiency. These findings suggest that AI assistance can enhance diagnostic performance and reduce interpretation time, offering considerable benefits for emergency departments where rapid and accurate fracture detection is paramount.\",\"PeriodicalId\":519940,\"journal\":{\"name\":\"Radiology advances\",\"volume\":\"2 3\",\"pages\":\"umaf020\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12429206/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiology advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/radadv/umaf020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/radadv/umaf020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的：评估MobileNetV3-YOLOv8深度学习（DL）模型在膝关节x线片上检测胫骨平台骨折（TPFs），包括隐匿性胫骨平台骨折（otpf）。我们假设DL模型将提高诊断性能并减少解释时间，特别是对于经验不足的医生。材料与方法：本研究为回顾性、多中心研究，纳入来自中国5家三级医院的1543例成人患者。总共包括3547张x光片：2837张用于培训/验证，710张来自单个外部中心用于测试。正常267例（37.6%），明显tpf 282例（39.7%），otpf 161例（22.7%）。性能指标包括敏感性、特异性、准确性、阳性预测值（PPV）、阴性预测值（NPV）、f1评分和受试者工作特征曲线下面积（AUROC）。11名医生（6名经验丰富，5名经验不足）在有或没有DL辅助的情况下解释了70张测试图像。评估解读者一致性（Fleiss’κ）和解读时间。结果：对于明显的TPFs，模型的灵敏度为89.4%(95%置信区间[CI]， 85.7-92.3)，特异性为92.5% (95% CI, 89.9-95.1)， PPV为88.7%，NPV为92.9%，f1评分为89.0%，准确率为91.9% （95% CI, 89.7-94.1）。对于OTPFs，该方法的灵敏度为85.7% (95% CI, 81.2-89.4)，特异性为91.3% (95% CI, 88.5-93.2)， PPV为74.2%，NPV为95.6%，f1评分为79.5%，准确性为88.2% （95% CI, 86.4-89.8）。总体AUROC为0.949 （95% CI, 0.935-0.963）。DL辅助提高了经验不足的读者的OTPF敏感性（67.5%至83.8%），增加了译员一致性（κ）（0.58 [95% CI, 0.52-0.64]至0.71 [95% CI, 0.65-0.76]），缩短了平均口译时间（55.8秒至34.3秒）。结论：MobileNetV3-YOLOv8模型能准确检测出明显和隐匿的TPFs，大大提高了诊断灵敏度、解读器一致性和效率。这些研究结果表明，人工智能辅助可以提高诊断性能并减少解释时间，为急诊科提供了相当大的好处，因为快速准确的骨折检测是至关重要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

$Deep learning diagnosis of adult tibial plateau fractures: multicenter study with external validation.$

查看原文本刊更多论文

Deep learning diagnosis of adult tibial plateau fractures: multicenter study with external validation.

Purpose: To evaluate a MobileNetV3-YOLOv8 deep learning (DL) model for detecting tibial plateau fractures (TPFs), including occult TPFs (OTPFs), on knee radiographs. We hypothesized that the DL model would improve diagnostic performance and reduce interpretation time, particularly for less experienced physicians.

Materials and methods: This retrospective, multicenter study, included 1543 adult patients from 5 tertiary hospitals in China. A total of 3547 radiographs were included: 2837 for training/validation and 710 from a single external center for testing. In the test set, 267 (37.6%) were normal, 282 (39.7%) were obvious TPFs, and 161 (22.7%) were OTPFs. Performance metrics comprised sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), F1-score, and the area under the receiver operating characteristic curve (AUROC). Eleven physicians (6 experienced, 5 inexperienced) interpreted 70 test images with and without DL assistance. Interreader agreement (Fleiss' κ) and interpretation time were evaluated.

Results: For obvious TPFs, the model achieved 89.4% sensitivity (95% confidence interval [CI], 85.7-92.3), 92.5% specificity (95% CI, 89.9-95.1), 88.7% PPV, 92.9% NPV, 89.0% F1-score, and 91.9% accuracy (95% CI, 89.7-94.1). For OTPFs, it achieved 85.7% sensitivity (95% CI, 81.2-89.4), 91.3% specificity (95% CI, 88.5-93.2), 74.2% PPV, 95.6% NPV, 79.5% F1-score, and 88.2% accuracy (95% CI, 86.4-89.8). The overall AUROC was 0.949 (95% CI, 0.935-0.963). DL assistance improved OTPF sensitivity of less experienced readers (67.5% to 83.8%), increased interreader agreement (κ) (0.58 [95% CI, 0.52-0.64] to 0.71 [95% CI, 0.65-0.76] and reduced mean interpretation time (55.8 seconds to 34.3 seconds).

Conclusion: The MobileNetV3-YOLOv8 model accurately detected both obvious and occult TPFs, substantially improving diagnostic sensitivity, interreader agreement, and efficiency. These findings suggest that AI assistance can enhance diagnostic performance and reduce interpretation time, offering considerable benefits for emergency departments where rapid and accurate fracture detection is paramount.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Radiology advances

自引率

0.00%

发文量