MaS-TransUNet: A Multiattention Swin Transformer U-Net for Medical Image Segmentation

IF 4.6 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

IEEE Transactions on Radiation and Plasma Medical Sciences Pub Date : 2024-10-10 DOI:10.1109/TRPMS.2024.3477528

Ashwini Kumar Upadhyay;Ashish Kumar Bhandari

{"title":"MaS-TransUNet: A Multiattention Swin Transformer U-Net for Medical Image Segmentation","authors":"Ashwini Kumar Upadhyay;Ashish Kumar Bhandari","doi":"10.1109/TRPMS.2024.3477528","DOIUrl":null,"url":null,"abstract":"U-shaped encoder-decoder models have excelled in automatic medical image segmentation due to their hierarchical feature learning capabilities, robustness, and upgradability. Purely CNN-based models are excellent at extracting local details but struggle with long-range dependencies, whereas transformer-based models excel in global context modeling but have higher data and computational requirements. Self-attention-based transformers and other attention mechanisms have been shown to enhance segmentation accuracy in the encoder-decoder framework. Drawing from these challenges and opportunities, we propose a novel multiattention Swin transformer U-net (MaS-TransUNet) model, incorporating self-attention, edge attention, channel attention, and feedback attention. MaS-TransUNet leverages the strengths of both CNNs and transformers within a U-shaped encoder-decoder framework. For self-attention, we developed modules using Swin Transformer blocks, offering hierarchical feature representations. We designed specialized modules, including an edge attention module (EAM) to guide the network with edge information, a feedback attention module (FAM) to utilize previous epoch segmentation masks for refining subsequent predictions, and a channel attention module (CAM) to focus on relevant feature channels. We also introduced advanced data augmentation, regularizations, and an optimal training scheme for enhanced training. Comprehensive experiments across five diverse medical image segmentation datasets demonstrate that MaS-TransUNet significantly outperforms existing state-of-the-art methods while maintaining computational efficiency. It achieves the highest-Dice scores of 0.903, 0.841, 0.908, 0.906, and 0.906 on the Cancer genome atlas low-grade glioma Brain MRI, COVID-19 Lung CT, data science bowl-2018, Kvasir-SEG, and international skin imaging collaboration-2018 datasets, respectively. These results highlight the model’s robustness and versatility, consistently delivering exceptional performance without modality-specific adaptations.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"9 5","pages":"613-626"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10713266/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

U-shaped encoder-decoder models have excelled in automatic medical image segmentation due to their hierarchical feature learning capabilities, robustness, and upgradability. Purely CNN-based models are excellent at extracting local details but struggle with long-range dependencies, whereas transformer-based models excel in global context modeling but have higher data and computational requirements. Self-attention-based transformers and other attention mechanisms have been shown to enhance segmentation accuracy in the encoder-decoder framework. Drawing from these challenges and opportunities, we propose a novel multiattention Swin transformer U-net (MaS-TransUNet) model, incorporating self-attention, edge attention, channel attention, and feedback attention. MaS-TransUNet leverages the strengths of both CNNs and transformers within a U-shaped encoder-decoder framework. For self-attention, we developed modules using Swin Transformer blocks, offering hierarchical feature representations. We designed specialized modules, including an edge attention module (EAM) to guide the network with edge information, a feedback attention module (FAM) to utilize previous epoch segmentation masks for refining subsequent predictions, and a channel attention module (CAM) to focus on relevant feature channels. We also introduced advanced data augmentation, regularizations, and an optimal training scheme for enhanced training. Comprehensive experiments across five diverse medical image segmentation datasets demonstrate that MaS-TransUNet significantly outperforms existing state-of-the-art methods while maintaining computational efficiency. It achieves the highest-Dice scores of 0.903, 0.841, 0.908, 0.906, and 0.906 on the Cancer genome atlas low-grade glioma Brain MRI, COVID-19 Lung CT, data science bowl-2018, Kvasir-SEG, and international skin imaging collaboration-2018 datasets, respectively. These results highlight the model’s robustness and versatility, consistently delivering exceptional performance without modality-specific adaptations.

查看原文本刊更多论文

MaS-TransUNet：用于医学图像分割的多关注Swin变压器U-Net

u型编码器-解码器模型由于其分层特征学习能力，鲁棒性和可升级性而在自动医学图像分割中表现出色。纯粹基于cnn的模型在提取局部细节方面表现出色，但在长期依赖关系方面表现不佳，而基于变压器的模型在全局上下文建模方面表现出色，但对数据和计算的要求更高。基于自注意的变压器和其他注意机制已被证明可以提高编码器-解码器框架中的分割精度。从这些挑战和机遇中，我们提出了一种新的多注意力Swin变压器U-net （MaS-TransUNet）模型，结合了自我注意、边缘注意、通道注意和反馈注意。MaS-TransUNet在u型编码器-解码器框架内利用了cnn和变压器的优势。对于自关注，我们使用Swin Transformer块开发模块，提供分层特征表示。我们设计了专门的模块，包括边缘注意模块（EAM），用于用边缘信息指导网络，反馈注意模块（FAM），用于利用以前的epoch分割掩码来改进后续预测，以及通道注意模块（CAM），用于关注相关的特征通道。我们还介绍了高级数据增强、正则化和用于增强训练的最佳训练方案。五种不同医学图像分割数据集的综合实验表明，MaS-TransUNet在保持计算效率的同时，显著优于现有的最先进的方法。在Cancer genome atlas低级别胶质瘤Brain MRI、COVID-19 Lung CT、data science bowl-2018、Kvasir-SEG和国际皮肤成像协作-2018数据集上，其dice得分分别为0.903、0.841、0.908、0.906和0.906。这些结果突出了模型的健壮性和多功能性，在没有特定于模式的调整的情况下始终提供卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Radiation and Plasma Medical Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

8.00

自引率

18.20%

发文量

109