Khaja Mannanuddin, V.R. Vimal, Angalkuditi Srinivas, S.D. Uma Mageswari, G. Mahendran, J. Ramya, Ashok Kumar, Pranjal Das, R.G. Vidhya
{"title":"增强医学图像分析:一种全连接神经网络分类器与CNN-VIT的融合,用于改进视网膜疾病检测","authors":"Khaja Mannanuddin, V.R. Vimal, Angalkuditi Srinivas, S.D. Uma Mageswari, G. Mahendran, J. Ramya, Ashok Kumar, Pranjal Das, R.G. Vidhya","doi":"10.3233/jifs-235055","DOIUrl":null,"url":null,"abstract":"Diseases of the retina continue to be a leading cause of blindness and visual impairment around the world. In the field of medical image analysis, specifically retinal disease identification, deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have showed remarkable potential. In this paper, we present a unique method for detecting retinal diseases by combining the advantages of the Inception-V3, ResNet-50, and Vision Transformer architectures into a single model called a Cascade CNN-ViT. The suggested Cascade CNN-ViT model extracts local features from retinal pictures by leveraging the spatial hierarchy learning capabilities of Inception-V3 and ResNet-50. The Vision Transformer takes these regional characteristics and uses self-attention mechanisms to pick up global context information and long-range interdependence. The model successfully combines fine-grained local information with semantically significant global contextual cues by merging the output representations from the CNNs and Vision Transformer. undertaking comprehensive experiments on a large and varied dataset of multimodal retinal pictures to evaluate the performance of the proposed technique. Cascade CNN-ViT model outperforms standalone CNNs and Vision Transformers, as shown by the experimental findings. The model is also resilient across all classes of retinal diseases and is able to successfully deal with the complications introduced by using multiple picture types. Overall, the power of cascading Inception-V3, ResNet-50, and Vision Transformer topologies for improved retinal illness diagnosis has been demonstrated. Potentially improving the management of retinal illnesses and preserving visual health, the proposed approach could have important consequences for early detection and timely intervention.","PeriodicalId":54795,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"13 1","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing medical image analysis: A fusion of fully connected neural network classifier with CNN-VIT for improved retinal disease detection\",\"authors\":\"Khaja Mannanuddin, V.R. Vimal, Angalkuditi Srinivas, S.D. Uma Mageswari, G. Mahendran, J. Ramya, Ashok Kumar, Pranjal Das, R.G. Vidhya\",\"doi\":\"10.3233/jifs-235055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diseases of the retina continue to be a leading cause of blindness and visual impairment around the world. In the field of medical image analysis, specifically retinal disease identification, deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have showed remarkable potential. In this paper, we present a unique method for detecting retinal diseases by combining the advantages of the Inception-V3, ResNet-50, and Vision Transformer architectures into a single model called a Cascade CNN-ViT. The suggested Cascade CNN-ViT model extracts local features from retinal pictures by leveraging the spatial hierarchy learning capabilities of Inception-V3 and ResNet-50. The Vision Transformer takes these regional characteristics and uses self-attention mechanisms to pick up global context information and long-range interdependence. The model successfully combines fine-grained local information with semantically significant global contextual cues by merging the output representations from the CNNs and Vision Transformer. undertaking comprehensive experiments on a large and varied dataset of multimodal retinal pictures to evaluate the performance of the proposed technique. Cascade CNN-ViT model outperforms standalone CNNs and Vision Transformers, as shown by the experimental findings. The model is also resilient across all classes of retinal diseases and is able to successfully deal with the complications introduced by using multiple picture types. Overall, the power of cascading Inception-V3, ResNet-50, and Vision Transformer topologies for improved retinal illness diagnosis has been demonstrated. Potentially improving the management of retinal illnesses and preserving visual health, the proposed approach could have important consequences for early detection and timely intervention.\",\"PeriodicalId\":54795,\"journal\":{\"name\":\"Journal of Intelligent & Fuzzy Systems\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Fuzzy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jifs-235055\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-235055","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Enhancing medical image analysis: A fusion of fully connected neural network classifier with CNN-VIT for improved retinal disease detection
Diseases of the retina continue to be a leading cause of blindness and visual impairment around the world. In the field of medical image analysis, specifically retinal disease identification, deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have showed remarkable potential. In this paper, we present a unique method for detecting retinal diseases by combining the advantages of the Inception-V3, ResNet-50, and Vision Transformer architectures into a single model called a Cascade CNN-ViT. The suggested Cascade CNN-ViT model extracts local features from retinal pictures by leveraging the spatial hierarchy learning capabilities of Inception-V3 and ResNet-50. The Vision Transformer takes these regional characteristics and uses self-attention mechanisms to pick up global context information and long-range interdependence. The model successfully combines fine-grained local information with semantically significant global contextual cues by merging the output representations from the CNNs and Vision Transformer. undertaking comprehensive experiments on a large and varied dataset of multimodal retinal pictures to evaluate the performance of the proposed technique. Cascade CNN-ViT model outperforms standalone CNNs and Vision Transformers, as shown by the experimental findings. The model is also resilient across all classes of retinal diseases and is able to successfully deal with the complications introduced by using multiple picture types. Overall, the power of cascading Inception-V3, ResNet-50, and Vision Transformer topologies for improved retinal illness diagnosis has been demonstrated. Potentially improving the management of retinal illnesses and preserving visual health, the proposed approach could have important consequences for early detection and timely intervention.
期刊介绍:
The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.