增强医学图像分析:一种全连接神经网络分类器与CNN-VIT的融合，用于改进视网膜疾病检测

IF 1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent & Fuzzy Systems Pub Date : 2023-10-25 DOI:10.3233/jifs-235055

Khaja Mannanuddin, V.R. Vimal, Angalkuditi Srinivas, S.D. Uma Mageswari, G. Mahendran, J. Ramya, Ashok Kumar, Pranjal Das, R.G. Vidhya

{"title":"增强医学图像分析:一种全连接神经网络分类器与CNN-VIT的融合，用于改进视网膜疾病检测","authors":"Khaja Mannanuddin, V.R. Vimal, Angalkuditi Srinivas, S.D. Uma Mageswari, G. Mahendran, J. Ramya, Ashok Kumar, Pranjal Das, R.G. Vidhya","doi":"10.3233/jifs-235055","DOIUrl":null,"url":null,"abstract":"Diseases of the retina continue to be a leading cause of blindness and visual impairment around the world. In the field of medical image analysis, specifically retinal disease identification, deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have showed remarkable potential. In this paper, we present a unique method for detecting retinal diseases by combining the advantages of the Inception-V3, ResNet-50, and Vision Transformer architectures into a single model called a Cascade CNN-ViT. The suggested Cascade CNN-ViT model extracts local features from retinal pictures by leveraging the spatial hierarchy learning capabilities of Inception-V3 and ResNet-50. The Vision Transformer takes these regional characteristics and uses self-attention mechanisms to pick up global context information and long-range interdependence. The model successfully combines fine-grained local information with semantically significant global contextual cues by merging the output representations from the CNNs and Vision Transformer. undertaking comprehensive experiments on a large and varied dataset of multimodal retinal pictures to evaluate the performance of the proposed technique. Cascade CNN-ViT model outperforms standalone CNNs and Vision Transformers, as shown by the experimental findings. The model is also resilient across all classes of retinal diseases and is able to successfully deal with the complications introduced by using multiple picture types. Overall, the power of cascading Inception-V3, ResNet-50, and Vision Transformer topologies for improved retinal illness diagnosis has been demonstrated. Potentially improving the management of retinal illnesses and preserving visual health, the proposed approach could have important consequences for early detection and timely intervention.","PeriodicalId":54795,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"13 1","pages":"0"},"PeriodicalIF":1.0000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing medical image analysis: A fusion of fully connected neural network classifier with CNN-VIT for improved retinal disease detection\",\"authors\":\"Khaja Mannanuddin, V.R. Vimal, Angalkuditi Srinivas, S.D. Uma Mageswari, G. Mahendran, J. Ramya, Ashok Kumar, Pranjal Das, R.G. Vidhya\",\"doi\":\"10.3233/jifs-235055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diseases of the retina continue to be a leading cause of blindness and visual impairment around the world. In the field of medical image analysis, specifically retinal disease identification, deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have showed remarkable potential. In this paper, we present a unique method for detecting retinal diseases by combining the advantages of the Inception-V3, ResNet-50, and Vision Transformer architectures into a single model called a Cascade CNN-ViT. The suggested Cascade CNN-ViT model extracts local features from retinal pictures by leveraging the spatial hierarchy learning capabilities of Inception-V3 and ResNet-50. The Vision Transformer takes these regional characteristics and uses self-attention mechanisms to pick up global context information and long-range interdependence. The model successfully combines fine-grained local information with semantically significant global contextual cues by merging the output representations from the CNNs and Vision Transformer. undertaking comprehensive experiments on a large and varied dataset of multimodal retinal pictures to evaluate the performance of the proposed technique. Cascade CNN-ViT model outperforms standalone CNNs and Vision Transformers, as shown by the experimental findings. The model is also resilient across all classes of retinal diseases and is able to successfully deal with the complications introduced by using multiple picture types. Overall, the power of cascading Inception-V3, ResNet-50, and Vision Transformer topologies for improved retinal illness diagnosis has been demonstrated. Potentially improving the management of retinal illnesses and preserving visual health, the proposed approach could have important consequences for early detection and timely intervention.\",\"PeriodicalId\":54795,\"journal\":{\"name\":\"Journal of Intelligent & Fuzzy Systems\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2023-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Fuzzy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jifs-235055\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-235055","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视网膜疾病仍然是世界各地失明和视力障碍的主要原因。在医学图像分析领域，特别是视网膜疾病识别，深度学习技术，如卷积神经网络(cnn)和视觉变换(ViTs)，已经显示出显着的潜力。在本文中，我们提出了一种独特的检测视网膜疾病的方法，该方法将Inception-V3, ResNet-50和Vision Transformer架构的优势结合到一个称为Cascade CNN-ViT的单一模型中。本文提出的Cascade CNN-ViT模型利用Inception-V3和ResNet-50的空间层次学习能力从视网膜图像中提取局部特征。Vision Transformer采用这些区域特征，并使用自关注机制来获取全局上下文信息和远程相互依赖关系。该模型通过合并cnn和Vision Transformer的输出表示，成功地将细粒度的局部信息与语义上重要的全局上下文线索结合起来。在多模态视网膜图像的大数据集上进行综合实验，以评估所提出的技术的性能。实验结果表明，级联CNN-ViT模型优于独立cnn和视觉变压器。该模型在所有类别的视网膜疾病中也具有弹性，并且能够成功地处理使用多种图像类型引入的并发症。总的来说，级联Inception-V3、ResNet-50和Vision Transformer拓扑在改善视网膜疾病诊断方面的能力已经得到证明。提出的方法可能会改善视网膜疾病的管理和保持视觉健康，对早期发现和及时干预产生重要影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing medical image analysis: A fusion of fully connected neural network classifier with CNN-VIT for improved retinal disease detection

Diseases of the retina continue to be a leading cause of blindness and visual impairment around the world. In the field of medical image analysis, specifically retinal disease identification, deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have showed remarkable potential. In this paper, we present a unique method for detecting retinal diseases by combining the advantages of the Inception-V3, ResNet-50, and Vision Transformer architectures into a single model called a Cascade CNN-ViT. The suggested Cascade CNN-ViT model extracts local features from retinal pictures by leveraging the spatial hierarchy learning capabilities of Inception-V3 and ResNet-50. The Vision Transformer takes these regional characteristics and uses self-attention mechanisms to pick up global context information and long-range interdependence. The model successfully combines fine-grained local information with semantically significant global contextual cues by merging the output representations from the CNNs and Vision Transformer. undertaking comprehensive experiments on a large and varied dataset of multimodal retinal pictures to evaluate the performance of the proposed technique. Cascade CNN-ViT model outperforms standalone CNNs and Vision Transformers, as shown by the experimental findings. The model is also resilient across all classes of retinal diseases and is able to successfully deal with the complications introduced by using multiple picture types. Overall, the power of cascading Inception-V3, ResNet-50, and Vision Transformer topologies for improved retinal illness diagnosis has been demonstrated. Potentially improving the management of retinal illnesses and preserving visual health, the proposed approach could have important consequences for early detection and timely intervention.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Intelligent & Fuzzy Systems 工程技术-计算机：人工智能

CiteScore

3.40

自引率

10.00%

发文量

965

审稿时长

5.1 months

期刊介绍： The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.