{"title":"OralTransNet:一种集成了变压器注意力和CNN特征的新型混合模型,用于口腔和口腔疾病的准确诊断","authors":"Sohaib Asif , Vicky Yang Wang , Dong Xu","doi":"10.1016/j.engappai.2025.111609","DOIUrl":null,"url":null,"abstract":"<div><div>The rising prevalence of mouth and oral diseases (MOD), including gum disease and oral cancer, presents a significant global health challenge. Early detection is crucial for effective intervention. However, existing models often rely on complex preprocessing, computationally expensive operations, and specialized resources, leading to inefficiency and limited practicality. This paper presents a novel lightweight hybrid model that combines the local feature extraction strengths of CNNs with the global contextual power of Transformer attention mechanisms, contributing to the advancement of artificial intelligence (AI) in medical image analysis. The proposed architecture integrates the local feature extraction efficiency of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables the model to effectively capture both fine-grained details and broader spatial patterns, while maintaining low computational complexity. By leveraging CNNs' weight-sharing properties for efficient feature extraction and Transformers' ability to model global patterns, the proposed model performs well across datasets of varying sizes and complexities. Its lightweight design emphasizes efficiency, with fewer parameters, reduced floating-point operations (FLOPs), and shorter inference times, making it ideal for real-time AI applications, particularly in resource-constrained environments. The proposed model is also well-suited for deployment on mobile devices and in regions with limited medical infrastructure, providing a scalable solution for early diagnosis in diverse healthcare settings. In the context of medical engineering, the proposed model is applied to the automated detection of mouth and oral diseases (MOD) using both clinical and histopathological images. This approach aims to enhance diagnostic capabilities in resource-constrained clinical environments. The model is rigorously evaluated on three datasets: the MOD dataset (5143 images, 7 classes), the Oral Cancer dataset (241 images, 2 classes), and the Histopathological Oral Cancer dataset (5192 images, 2 classes). The proposed model achieves accuracies of 99.03 %, 97.83 %, and 94.23 %, respectively—surpassing several state-of-the-art (SOTA) models. Its strong performance, lightweight design, and enhanced interpretability position it as a practical and scalable solution for early and reliable oral disease detection in diverse clinical settings.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"159 ","pages":"Article 111609"},"PeriodicalIF":8.0000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OralTransNet: A novel hybrid model integrating transformer attention and CNN features for accurate diagnosis of mouth and oral diseases\",\"authors\":\"Sohaib Asif , Vicky Yang Wang , Dong Xu\",\"doi\":\"10.1016/j.engappai.2025.111609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rising prevalence of mouth and oral diseases (MOD), including gum disease and oral cancer, presents a significant global health challenge. Early detection is crucial for effective intervention. However, existing models often rely on complex preprocessing, computationally expensive operations, and specialized resources, leading to inefficiency and limited practicality. This paper presents a novel lightweight hybrid model that combines the local feature extraction strengths of CNNs with the global contextual power of Transformer attention mechanisms, contributing to the advancement of artificial intelligence (AI) in medical image analysis. The proposed architecture integrates the local feature extraction efficiency of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables the model to effectively capture both fine-grained details and broader spatial patterns, while maintaining low computational complexity. By leveraging CNNs' weight-sharing properties for efficient feature extraction and Transformers' ability to model global patterns, the proposed model performs well across datasets of varying sizes and complexities. Its lightweight design emphasizes efficiency, with fewer parameters, reduced floating-point operations (FLOPs), and shorter inference times, making it ideal for real-time AI applications, particularly in resource-constrained environments. The proposed model is also well-suited for deployment on mobile devices and in regions with limited medical infrastructure, providing a scalable solution for early diagnosis in diverse healthcare settings. In the context of medical engineering, the proposed model is applied to the automated detection of mouth and oral diseases (MOD) using both clinical and histopathological images. This approach aims to enhance diagnostic capabilities in resource-constrained clinical environments. The model is rigorously evaluated on three datasets: the MOD dataset (5143 images, 7 classes), the Oral Cancer dataset (241 images, 2 classes), and the Histopathological Oral Cancer dataset (5192 images, 2 classes). The proposed model achieves accuracies of 99.03 %, 97.83 %, and 94.23 %, respectively—surpassing several state-of-the-art (SOTA) models. Its strong performance, lightweight design, and enhanced interpretability position it as a practical and scalable solution for early and reliable oral disease detection in diverse clinical settings.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"159 \",\"pages\":\"Article 111609\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625016112\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625016112","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
OralTransNet: A novel hybrid model integrating transformer attention and CNN features for accurate diagnosis of mouth and oral diseases
The rising prevalence of mouth and oral diseases (MOD), including gum disease and oral cancer, presents a significant global health challenge. Early detection is crucial for effective intervention. However, existing models often rely on complex preprocessing, computationally expensive operations, and specialized resources, leading to inefficiency and limited practicality. This paper presents a novel lightweight hybrid model that combines the local feature extraction strengths of CNNs with the global contextual power of Transformer attention mechanisms, contributing to the advancement of artificial intelligence (AI) in medical image analysis. The proposed architecture integrates the local feature extraction efficiency of convolutional neural networks (CNNs) with the global context modeling strength of Transformers. This combination enables the model to effectively capture both fine-grained details and broader spatial patterns, while maintaining low computational complexity. By leveraging CNNs' weight-sharing properties for efficient feature extraction and Transformers' ability to model global patterns, the proposed model performs well across datasets of varying sizes and complexities. Its lightweight design emphasizes efficiency, with fewer parameters, reduced floating-point operations (FLOPs), and shorter inference times, making it ideal for real-time AI applications, particularly in resource-constrained environments. The proposed model is also well-suited for deployment on mobile devices and in regions with limited medical infrastructure, providing a scalable solution for early diagnosis in diverse healthcare settings. In the context of medical engineering, the proposed model is applied to the automated detection of mouth and oral diseases (MOD) using both clinical and histopathological images. This approach aims to enhance diagnostic capabilities in resource-constrained clinical environments. The model is rigorously evaluated on three datasets: the MOD dataset (5143 images, 7 classes), the Oral Cancer dataset (241 images, 2 classes), and the Histopathological Oral Cancer dataset (5192 images, 2 classes). The proposed model achieves accuracies of 99.03 %, 97.83 %, and 94.23 %, respectively—surpassing several state-of-the-art (SOTA) models. Its strong performance, lightweight design, and enhanced interpretability position it as a practical and scalable solution for early and reliable oral disease detection in diverse clinical settings.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.