OralTransNet：一种集成了变压器注意力和CNN特征的新型混合模型，用于口腔和口腔疾病的准确诊断

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2025-07-02 DOI:10.1016/j.engappai.2025.111609

Sohaib Asif , Vicky Yang Wang , Dong Xu

{"title":"OralTransNet：一种集成了变压器注意力和CNN特征的新型混合模型，用于口腔和口腔疾病的准确诊断","authors":"Sohaib Asif , Vicky Yang Wang , Dong Xu","doi":"10.1016/j.engappai.2025.111609","DOIUrl":null,"url":null,"abstract":"<div><div>The rising prevalence of mouth and oral diseases (MOD), including gum disease and oral cancer, presents a significant global health challenge. Early detection is crucial for effective intervention. However, existing models often rely on complex preprocessing, computationally expensive operations, and specialized resources, leading to inefficiency and limited practicality. This paper presents a novel lightweight hybrid model that combines the local feature extraction strengths of CNNs with the global contextual power of Transformer attention mechanisms, contributing to the advancement of artificial intelligence (AI) in medical image analysis. The proposed architecture integrates the local feature extraction efficiency of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables the model to effectively capture both fine-grained details and broader spatial patterns, while maintaining low computational complexity. By leveraging CNNs' weight-sharing properties for efficient feature extraction and Transformers' ability to model global patterns, the proposed model performs well across datasets of varying sizes and complexities. Its lightweight design emphasizes efficiency, with fewer parameters, reduced floating-point operations (FLOPs), and shorter inference times, making it ideal for real-time AI applications, particularly in resource-constrained environments. The proposed model is also well-suited for deployment on mobile devices and in regions with limited medical infrastructure, providing a scalable solution for early diagnosis in diverse healthcare settings. In the context of medical engineering, the proposed model is applied to the automated detection of mouth and oral diseases (MOD) using both clinical and histopathological images. This approach aims to enhance diagnostic capabilities in resource-constrained clinical environments. The model is rigorously evaluated on three datasets: the MOD dataset (5143 images, 7 classes), the Oral Cancer dataset (241 images, 2 classes), and the Histopathological Oral Cancer dataset (5192 images, 2 classes). The proposed model achieves accuracies of 99.03 %, 97.83 %, and 94.23 %, respectively—surpassing several state-of-the-art (SOTA) models. Its strong performance, lightweight design, and enhanced interpretability position it as a practical and scalable solution for early and reliable oral disease detection in diverse clinical settings.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111609"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OralTransNet: A novel hybrid model integrating transformer attention and CNN features for accurate diagnosis of mouth and oral diseases\",\"authors\":\"Sohaib Asif , Vicky Yang Wang , Dong Xu\",\"doi\":\"10.1016/j.engappai.2025.111609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rising prevalence of mouth and oral diseases (MOD), including gum disease and oral cancer, presents a significant global health challenge. Early detection is crucial for effective intervention. However, existing models often rely on complex preprocessing, computationally expensive operations, and specialized resources, leading to inefficiency and limited practicality. This paper presents a novel lightweight hybrid model that combines the local feature extraction strengths of CNNs with the global contextual power of Transformer attention mechanisms, contributing to the advancement of artificial intelligence (AI) in medical image analysis. The proposed architecture integrates the local feature extraction efficiency of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables the model to effectively capture both fine-grained details and broader spatial patterns, while maintaining low computational complexity. By leveraging CNNs' weight-sharing properties for efficient feature extraction and Transformers' ability to model global patterns, the proposed model performs well across datasets of varying sizes and complexities. Its lightweight design emphasizes efficiency, with fewer parameters, reduced floating-point operations (FLOPs), and shorter inference times, making it ideal for real-time AI applications, particularly in resource-constrained environments. The proposed model is also well-suited for deployment on mobile devices and in regions with limited medical infrastructure, providing a scalable solution for early diagnosis in diverse healthcare settings. In the context of medical engineering, the proposed model is applied to the automated detection of mouth and oral diseases (MOD) using both clinical and histopathological images. This approach aims to enhance diagnostic capabilities in resource-constrained clinical environments. The model is rigorously evaluated on three datasets: the MOD dataset (5143 images, 7 classes), the Oral Cancer dataset (241 images, 2 classes), and the Histopathological Oral Cancer dataset (5192 images, 2 classes). The proposed model achieves accuracies of 99.03 %, 97.83 %, and 94.23 %, respectively—surpassing several state-of-the-art (SOTA) models. Its strong performance, lightweight design, and enhanced interpretability position it as a practical and scalable solution for early and reliable oral disease detection in diverse clinical settings.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"159 \",\"pages\":\"Article 111609\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625016112\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625016112","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

口腔和口腔疾病（MOD），包括牙龈疾病和口腔癌的患病率上升，提出了一个重大的全球健康挑战。早期发现对有效干预至关重要。然而，现有的模型往往依赖于复杂的预处理、计算上昂贵的操作和专门的资源，导致效率低下和实用性有限。本文提出了一种新的轻量级混合模型，该模型将cnn的局部特征提取优势与Transformer注意机制的全局上下文能力相结合，为人工智能（AI）在医学图像分析中的应用做出了贡献。该架构将卷积神经网络（cnn）的局部特征提取效率与变压器的全局上下文建模强度相结合。这种组合使模型能够有效地捕获细粒度细节和更广泛的空间模式，同时保持较低的计算复杂性。通过利用cnn的权重共享特性进行有效的特征提取和Transformers的全局模式建模能力，所提出的模型在不同大小和复杂性的数据集上表现良好。其轻量级设计强调效率，具有更少的参数、更少的浮点运算（flop）和更短的推理时间，使其成为实时人工智能应用的理想选择，特别是在资源受限的环境中。所提出的模型也非常适合在移动设备和医疗基础设施有限的地区部署，为不同医疗保健环境中的早期诊断提供了可扩展的解决方案。在医学工程的背景下，该模型被应用于使用临床和组织病理学图像自动检测口腔和口腔疾病（MOD）。这种方法旨在提高在资源有限的临床环境中的诊断能力。该模型在三个数据集上进行了严格的评估：MOD数据集（5143张图像，7类），口腔癌数据集（241张图像，2类）和组织病理学口腔癌数据集（5192张图像，2类）。该模型的准确率分别为99.03%、97.83%和94.23%，超过了几种最先进的（SOTA）模型。其强大的性能，轻量级的设计和增强的可解释性使其成为在各种临床环境中早期可靠的口腔疾病检测的实用和可扩展的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

OralTransNet: A novel hybrid model integrating transformer attention and CNN features for accurate diagnosis of mouth and oral diseases

The rising prevalence of mouth and oral diseases (MOD), including gum disease and oral cancer, presents a significant global health challenge. Early detection is crucial for effective intervention. However, existing models often rely on complex preprocessing, computationally expensive operations, and specialized resources, leading to inefficiency and limited practicality. This paper presents a novel lightweight hybrid model that combines the local feature extraction strengths of CNNs with the global contextual power of Transformer attention mechanisms, contributing to the advancement of artificial intelligence (AI) in medical image analysis. The proposed architecture integrates the local feature extraction efficiency of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables the model to effectively capture both fine-grained details and broader spatial patterns, while maintaining low computational complexity. By leveraging CNNs' weight-sharing properties for efficient feature extraction and Transformers' ability to model global patterns, the proposed model performs well across datasets of varying sizes and complexities. Its lightweight design emphasizes efficiency, with fewer parameters, reduced floating-point operations (FLOPs), and shorter inference times, making it ideal for real-time AI applications, particularly in resource-constrained environments. The proposed model is also well-suited for deployment on mobile devices and in regions with limited medical infrastructure, providing a scalable solution for early diagnosis in diverse healthcare settings. In the context of medical engineering, the proposed model is applied to the automated detection of mouth and oral diseases (MOD) using both clinical and histopathological images. This approach aims to enhance diagnostic capabilities in resource-constrained clinical environments. The model is rigorously evaluated on three datasets: the MOD dataset (5143 images, 7 classes), the Oral Cancer dataset (241 images, 2 classes), and the Histopathological Oral Cancer dataset (5192 images, 2 classes). The proposed model achieves accuracies of 99.03 %, 97.83 %, and 94.23 %, respectively—surpassing several state-of-the-art (SOTA) models. Its strong performance, lightweight design, and enhanced interpretability position it as a practical and scalable solution for early and reliable oral disease detection in diverse clinical settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.