为牙科放射学人工智能的下游任务做准备：深度学习模型的基线性能比较。

IF 2.9 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Dento maxillo facial radiology Pub Date : 2025-02-01 DOI:10.1093/dmfr/twae056

Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin

{"title":"为牙科放射学人工智能的下游任务做准备：深度学习模型的基线性能比较。","authors":"Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin","doi":"10.1093/dmfr/twae056","DOIUrl":null,"url":null,"abstract":"Objectives: To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.Methods: Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.Results: The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.Conclusions: The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.","PeriodicalId":11261,"journal":{"name":"Dento maxillo facial radiology","volume":" ","pages":"149-162"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784916/pdf/","citationCount":"0","resultStr":"{\"title\":\"Preparing for downstream tasks in artificial intelligence for dental radiology: a baseline performance comparison of deep learning models.\",\"authors\":\"Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin\",\"doi\":\"10.1093/dmfr/twae056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.Methods: Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.Results: The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.Conclusions: The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.\",\"PeriodicalId\":11261,\"journal\":{\"name\":\"Dento maxillo facial radiology\",\"volume\":\" \",\"pages\":\"149-162\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784916/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dento maxillo facial radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/dmfr/twae056\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dento maxillo facial radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/dmfr/twae056","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

研究目的比较卷积神经网络（CNN）、视觉转换器（ViT）和门控多层感知器（gMLP）在牙科结构放射影像分类中的性能：使用从锥束计算机断层扫描体积中回溯收集的二维图像来训练 CNN、ViT 和 gMLP 架构，作为 4 个不同病例的分类器。选择用于训练架构的病例包括上颌窦、上颌切牙和下颌切牙的放射学外观分类、有无牙合孔以及下颌第三磨牙与下牙槽神经管的位置关系。计算了性能指标（灵敏度、特异性、精确度、准确度和 f1-分数）和曲线下面积（AUC）-接收者操作特征曲线和精确度-调用曲线：在所有任务中，ViT 的准确度为 0.74-0.98，与 CNN 模型（准确度为 0.71-0.99）相当。gMLP 的准确率（0.65-0.98）略低于 CNN 和 ViT。在某些任务中，ViT 的表现优于 CNN。在所有 4 个案例中，AUC 分别为 0.77-1.00（CNN）、0.80-1.00（ViT）和 0.73-1.00（gMLP）：在某些任务中，ViT、gMLP 和 CNN（目前最先进的）的性能差异显著。不同任务中模型性能的差异证明，可以利用不同架构的能力：视觉转换器和门控多层感知器都是深度学习模型，在牙科放射影像分类中表现出与卷积神经网络相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Preparing for downstream tasks in artificial intelligence for dental radiology: a baseline performance comparison of deep learning models.

Objectives: To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.

Methods: Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.

Results: The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.

Conclusions: The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Dento maxillo facial radiology 医学-核医学

CiteScore

5.60

自引率

9.10%

发文量

审稿时长

4-8 weeks

期刊介绍： Dentomaxillofacial Radiology (DMFR) is the journal of the International Association of Dentomaxillofacial Radiology (IADMFR) and covers the closely related fields of oral radiology and head and neck imaging. Established in 1972, DMFR is a key resource keeping dentists, radiologists and clinicians and scientists with an interest in Head and Neck imaging abreast of important research and developments in oral and maxillofacial radiology. The DMFR editorial board features a panel of international experts including Editor-in-Chief Professor Ralf Schulze. Our editorial board provide their expertise and guidance in shaping the content and direction of the journal. Quick Facts: - 2015 Impact Factor - 1.919 - Receipt to first decision - average of 3 weeks - Acceptance to online publication - average of 3 weeks - Open access option - ISSN: 0250-832X - eISSN: 1476-542X