基于视觉注意的u型医学图像分割网络VA-TransUNet

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition Pub Date : 2022-11-17 DOI:10.1145/3581807.3581826

Ting Jiang, Tao Xu, Xiaoning Li

{"title":"基于视觉注意的u型医学图像分割网络VA-TransUNet","authors":"Ting Jiang, Tao Xu, Xiaoning Li","doi":"10.1145/3581807.3581826","DOIUrl":null,"url":null,"abstract":"Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention\",\"authors\":\"Ting Jiang, Tao Xu, Xiaoning Li\",\"doi\":\"10.1145/3581807.3581826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.\",\"PeriodicalId\":292813,\"journal\":{\"name\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581807.3581826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581807.3581826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要:医学图像分割在医学诊断中具有重要的临床意义，因为它可以在医学诊断中更好地发现病变，帮助医生辅助治疗。视觉转换器(Vision Transformer, ViT)在计算机视觉领域取得了显著的成果，并已被用于图像分割任务，但由于医学图像的特殊特性，其在医学图像分割方面的潜力仍未得到充分开发。此外，基于多头自注意(MSA)的ViT将图像转换为一维序列，从而破坏了图像的二维结构。因此，我们提出了VA-TransUNet，它结合了Transformer和卷积神经网络(CNN)的优点来捕获全局和局部上下文信息，并考虑了通道维度的特征。采用基于视觉注意的Transformer，以其作为编码器，CNN作为解码器，将图像直接送入Transformer。视觉注意的关键是大核注意(large kernel attention, LKA)，它是一种深度可分卷积，将一个大卷积分解成多个不同的卷积。在腹部多器官突触(Synapse)和心脏自动诊断挑战(ACDC)数据集上的实验表明，我们提出的VA-TransUNet优于当前最先进的网络。代码和经过训练的模型将在https://github.com/BeautySilly/VA-TransUNet上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VA-TransUNet: A U-shaped Medical Image Segmentation Network with Visual Attention

Abstract: Medical image segmentation is clinically important in medical diagnosis as it permits superior lesion detection in medical diagnosis to help physicians assist in treatment. Vision Transformer (ViT) has achieved remarkable results in computer vision and has been used for image segmentation tasks, but the potential in medical image segmentation remains largely unexplored with the special characteristics of medical images. Moreover, ViT based on multi-head self-attention (MSA) converts the image into a one-dimensional sequence, which destroys the two-dimensional structure of the image. Therefore, we propose VA-TransUNet, which combines the advantages of Transformer and Convolutional Neural Networks (CNN) to capture global and local contextual information and consider the features of channel dimensionality. Transformer based on visual attention is adopted, it is taken as the encoder, CNN is used as the decoder, and the image is directly fed into the Transformer. The key of visual attention is the large kernel attention (LKA), which is a depth-wise separable convolution that decomposes a large convolution into various convolutions. Experiment on Synapse of abdominal multi-organ (Synapse) and Automated Cardiac Diagnosis Challenge (ACDC) datasets demonstrate that we proposed VA-TransUNet outperforms the current the-state-of-art networks. The codes and trained models will be publicly and available at https://github.com/BeautySilly/VA-TransUNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

自引率

0.00%

发文量