Use of deep learning model for paediatric elbow radiograph binomial classification: initial experience, performance and lessons learnt

IF 1.7 4区医学 Q2 MEDICINE, GENERAL & INTERNAL

Singapore medical journal Pub Date : 2023-11-29 DOI:10.4103/singaporemedj.smj-2022-078

M. Tan, R. Y. Chua, Qiao Fan, M. Fortier, P. P. Chang

{"title":"Use of deep learning model for paediatric elbow radiograph binomial classification: initial experience, performance and lessons learnt","authors":"M. Tan, R. Y. Chua, Qiao Fan, M. Fortier, P. P. Chang","doi":"10.4103/singaporemedj.smj-2022-078","DOIUrl":null,"url":null,"abstract":"In this study, we aimed to compare the performance of a convolutional neural network (CNN)-based deep learning model that was trained on a dataset of normal and abnormal paediatric elbow radiographs with that of paediatric emergency department (ED) physicians on a binomial classification task. A total of 1,314 paediatric elbow lateral radiographs (patient mean age 8.2 years) were retrospectively retrieved and classified based on annotation as normal or abnormal (with pathology). They were then randomly partitioned to a development set (993 images); first and second tuning (validation) sets (109 and 100 images, respectively); and a test set (112 images). An artificial intelligence (AI) model was trained on the development set using the EfficientNet B1 network architecture. Its performance on the test set was compared to that of five physicians (inter-rater agreement: fair). Performance of the AI model and the physician group was tested using McNemar test. The accuracy of the AI model on the test set was 80.4% (95% confidence interval [CI] 71.8%–87.3%), and the area under the receiver operating characteristic curve (AUROC) was 0.872 (95% CI 0.831–0.947). The performance of the AI model vs. the physician group on the test set was: sensitivity 79.0% (95% CI: 68.4%–89.5%) vs. 64.9% (95% CI: 52.5%–77.3%; P = 0.088); and specificity 81.8% (95% CI: 71.6%–92.0%) vs. 87.3% (95% CI: 78.5%–96.1%; P = 0.439). The AI model showed good AUROC values and higher sensitivity, with the P-value at nominal significance when compared to the clinician group.","PeriodicalId":21752,"journal":{"name":"Singapore medical journal","volume":"1 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Singapore medical journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4103/singaporemedj.smj-2022-078","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we aimed to compare the performance of a convolutional neural network (CNN)-based deep learning model that was trained on a dataset of normal and abnormal paediatric elbow radiographs with that of paediatric emergency department (ED) physicians on a binomial classification task. A total of 1,314 paediatric elbow lateral radiographs (patient mean age 8.2 years) were retrospectively retrieved and classified based on annotation as normal or abnormal (with pathology). They were then randomly partitioned to a development set (993 images); first and second tuning (validation) sets (109 and 100 images, respectively); and a test set (112 images). An artificial intelligence (AI) model was trained on the development set using the EfficientNet B1 network architecture. Its performance on the test set was compared to that of five physicians (inter-rater agreement: fair). Performance of the AI model and the physician group was tested using McNemar test. The accuracy of the AI model on the test set was 80.4% (95% confidence interval [CI] 71.8%–87.3%), and the area under the receiver operating characteristic curve (AUROC) was 0.872 (95% CI 0.831–0.947). The performance of the AI model vs. the physician group on the test set was: sensitivity 79.0% (95% CI: 68.4%–89.5%) vs. 64.9% (95% CI: 52.5%–77.3%; P = 0.088); and specificity 81.8% (95% CI: 71.6%–92.0%) vs. 87.3% (95% CI: 78.5%–96.1%; P = 0.439). The AI model showed good AUROC values and higher sensitivity, with the P-value at nominal significance when compared to the clinician group.

查看原文本刊更多论文

使用深度学习模型进行儿科肘部 X 光片二项式分类：初步经验、性能和教训

本研究旨在比较基于卷积神经网络（CNN）的深度学习模型与儿科急诊科（ED）医生在二项式分类任务中的表现。我们回顾性地检索了 1,314 张儿科肘部侧位X光片（患者平均年龄为 8.2 岁），并根据注释将其分类为正常或异常（有病理）。然后将这些图像随机分为开发集（993 张图像）、第一和第二调整（验证）集（分别为 109 和 100 张图像）以及测试集（112 张图像）。使用 EfficientNet B1 网络架构在开发集上训练了一个人工智能（AI）模型。该模型在测试集上的表现与五位医生的表现进行了比较（评分者之间的一致性：尚可）。使用 McNemar 检验法测试了人工智能模型和医生组的性能。人工智能模型在测试集上的准确率为 80.4%（95% 置信区间 [CI] 71.8%-87.3%），接收者操作特征曲线下面积 (AUROC) 为 0.872（95% CI 0.831-0.947）。在测试集上，人工智能模型与医生组相比的表现为：灵敏度 79.0% (95% CI: 68.4%-89.5%) vs. 64.9% (95% CI: 52.5%-77.3%; P = 0.088)；特异度 81.8% (95% CI: 71.6%-92.0%) vs. 87.3% (95% CI: 78.5%-96.1%; P = 0.439)。人工智能模型显示出良好的AUROC值和更高的灵敏度，与临床医生组相比，P值具有名义显著性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Singapore medical journal MEDICINE, GENERAL & INTERNAL-

CiteScore

3.40

自引率

3.70%

发文量

149

审稿时长

3-6 weeks

期刊介绍： The Singapore Medical Journal (SMJ) is the monthly publication of Singapore Medical Association (SMA). The Journal aims to advance medical practice and clinical research by publishing high-quality articles that add to the clinical knowledge of physicians in Singapore and worldwide. SMJ is a general medical journal that focuses on all aspects of human health. The Journal publishes commissioned reviews, commentaries and editorials, original research, a small number of outstanding case reports, continuing medical education articles (ECG Series, Clinics in Diagnostic Imaging, Pictorial Essays, Practice Integration & Life-long Learning [PILL] Series), and short communications in the form of letters to the editor.