Turgut Felek, Hümeyra Tercanlı, Rümeysa Şendişçi Gök
{"title":"Evaluating vision transformers and convolutional neural networks in the context of dental image processing: a systematic review.","authors":"Turgut Felek, Hümeyra Tercanlı, Rümeysa Şendişçi Gök","doi":"10.1186/s12903-025-07036-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The aim of this systematic review is to compare the efficacy of convolutional neural networks (CNN) and Vision Transformers (ViT) in the field of dental imaging, in order to examine in depth the potential, advantages, and limitations of both models in this domain.</p><p><strong>Methods: </strong>The search strings used in the study were \"((\"Vision Transformer\" OR ViT OR \"Transformer architecture\") AND (\"Convolutional Neural Network\" OR CNN OR ConvNet) AND (Dental OR Dentistry OR \"Maxillofacial\" OR \"Oral Radiology\") AND (Image OR Imaging OR Radiograph))\". The search was conducted in January 2025. Two investigators independently evaluated the full texts of all eligible articles and excluded those that did not meet the inclusion/exclusion criteria.</p><p><strong>Results: </strong>Of 2596 articles, 21 met the inclusion criteria. Depending on the task category, of the 21 studies that were reviewed, 14 (66.7%) utilized classification, while 7 (33.3%) utilized segmentation. Panoramic radiography is the most commonly used imaging modality (52.3%) and the ViT-based model was observed to have the highest performance (58%).</p><p><strong>Conclusion: </strong>ViT-based deep learning models tend to exhibit higher performance in many dental image analysis scenarios compared to traditional convolutional neural networks. However, in practice CNN and ViT approaches can be used in a complementary manner.</p>","PeriodicalId":9072,"journal":{"name":"BMC Oral Health","volume":"25 1","pages":"1626"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Oral Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12903-025-07036-5","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The aim of this systematic review is to compare the efficacy of convolutional neural networks (CNN) and Vision Transformers (ViT) in the field of dental imaging, in order to examine in depth the potential, advantages, and limitations of both models in this domain.
Methods: The search strings used in the study were "(("Vision Transformer" OR ViT OR "Transformer architecture") AND ("Convolutional Neural Network" OR CNN OR ConvNet) AND (Dental OR Dentistry OR "Maxillofacial" OR "Oral Radiology") AND (Image OR Imaging OR Radiograph))". The search was conducted in January 2025. Two investigators independently evaluated the full texts of all eligible articles and excluded those that did not meet the inclusion/exclusion criteria.
Results: Of 2596 articles, 21 met the inclusion criteria. Depending on the task category, of the 21 studies that were reviewed, 14 (66.7%) utilized classification, while 7 (33.3%) utilized segmentation. Panoramic radiography is the most commonly used imaging modality (52.3%) and the ViT-based model was observed to have the highest performance (58%).
Conclusion: ViT-based deep learning models tend to exhibit higher performance in many dental image analysis scenarios compared to traditional convolutional neural networks. However, in practice CNN and ViT approaches can be used in a complementary manner.
期刊介绍:
BMC Oral Health is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of disorders of the mouth, teeth and gums, as well as related molecular genetics, pathophysiology, and epidemiology.