Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin
{"title":"为牙科放射学人工智能的下游任务做准备:深度学习模型的基线性能比较。","authors":"Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin","doi":"10.1093/dmfr/twae056","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.</p><p><strong>Methods: </strong>Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.</p><p><strong>Results: </strong>The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.</p><p><strong>Conclusions: </strong>The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.</p>","PeriodicalId":11261,"journal":{"name":"Dento maxillo facial radiology","volume":" ","pages":"149-162"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784916/pdf/","citationCount":"0","resultStr":"{\"title\":\"Preparing for downstream tasks in artificial intelligence for dental radiology: a baseline performance comparison of deep learning models.\",\"authors\":\"Fara A Fernandes, Mouzhi Ge, Georgi Chaltikyan, Martin W Gerdes, Christian W Omlin\",\"doi\":\"10.1093/dmfr/twae056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.</p><p><strong>Methods: </strong>Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.</p><p><strong>Results: </strong>The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.</p><p><strong>Conclusions: </strong>The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.</p>\",\"PeriodicalId\":11261,\"journal\":{\"name\":\"Dento maxillo facial radiology\",\"volume\":\" \",\"pages\":\"149-162\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784916/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dento maxillo facial radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/dmfr/twae056\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dento maxillo facial radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/dmfr/twae056","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Preparing for downstream tasks in artificial intelligence for dental radiology: a baseline performance comparison of deep learning models.
Objectives: To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.
Methods: Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.
Results: The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.
Conclusions: The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.
期刊介绍:
Dentomaxillofacial Radiology (DMFR) is the journal of the International Association of Dentomaxillofacial Radiology (IADMFR) and covers the closely related fields of oral radiology and head and neck imaging.
Established in 1972, DMFR is a key resource keeping dentists, radiologists and clinicians and scientists with an interest in Head and Neck imaging abreast of important research and developments in oral and maxillofacial radiology.
The DMFR editorial board features a panel of international experts including Editor-in-Chief Professor Ralf Schulze. Our editorial board provide their expertise and guidance in shaping the content and direction of the journal.
Quick Facts:
- 2015 Impact Factor - 1.919
- Receipt to first decision - average of 3 weeks
- Acceptance to online publication - average of 3 weeks
- Open access option
- ISSN: 0250-832X
- eISSN: 1476-542X