Optimizing transformer-based network via advanced decoder design for medical image segmentation.

IF 1.3 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Biomedical Physics & Engineering Express Pub Date : 2025-02-05 DOI:10.1088/2057-1976/adaec7

Weibin Yang, Zhiqi Dong, Mingyuan Xu, Longwei Xu, Dehua Geng, Yusong Li, Pengwei Wang

{"title":"Optimizing transformer-based network via advanced decoder design for medical image segmentation.","authors":"Weibin Yang, Zhiqi Dong, Mingyuan Xu, Longwei Xu, Dehua Geng, Yusong Li, Pengwei Wang","doi":"10.1088/2057-1976/adaec7","DOIUrl":null,"url":null,"abstract":"U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years. However, these improvements often focus on the encoder, overlooking the crucial role of the decoder in optimizing segmentation details. This design imbalance limits the potential for further enhancing segmentation performance. To address this issue, we analyze the roles of various decoder components, including upsampling method, skip connection, and feature extraction module, as well as the shortcomings of existing methods. Consequently, we propose Swin DER (i.e.,SwinUNETRDecoderEnhanced andRefined), by specifically optimizing the design of these three components. Swin DER performs upsampling using learnable interpolation algorithm called offset coordinate neighborhood weighted up sampling (Onsampling) and replaces traditional skip connection with spatial-channel parallel attention gate (SCP AG). Additionally, Swin DER introduces deformable convolution along with attention mechanism in the feature extraction module of the decoder. Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse dataset and the MSD brain tumor segmentation task. Code is available at:https://github.com/WillBeanYang/Swin-DER.","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/adaec7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years. However, these improvements often focus on the encoder, overlooking the crucial role of the decoder in optimizing segmentation details. This design imbalance limits the potential for further enhancing segmentation performance. To address this issue, we analyze the roles of various decoder components, including upsampling method, skip connection, and feature extraction module, as well as the shortcomings of existing methods. Consequently, we propose Swin DER (i.e.,SwinUNETRDecoderEnhanced andRefined), by specifically optimizing the design of these three components. Swin DER performs upsampling using learnable interpolation algorithm called offset coordinate neighborhood weighted up sampling (Onsampling) and replaces traditional skip connection with spatial-channel parallel attention gate (SCP AG). Additionally, Swin DER introduces deformable convolution along with attention mechanism in the feature extraction module of the decoder. Our model design achieves excellent results, surpassing other state-of-the-art methods on both the Synapse dataset and the MSD brain tumor segmentation task. Code is available at:https://github.com/WillBeanYang/Swin-DER.

查看原文本刊更多论文

基于先进解码器设计的变压器网络优化医学图像分割。

U-Net结构设计简单灵活，在医学图像分割中得到了广泛的应用。为了解决医疗任务的规模和复杂性的挑战，已经提出了几种U-Net的变体。特别是以Swin UNETR为代表的基于视觉变换（Vision Transformer, ViT）的方法近年来得到了广泛的关注。然而，这些改进通常集中在编码器上，而忽略了解码器在优化分割细节方面的关键作用。这种设计不平衡限制了进一步提高分割性能的潜力。为了解决这个问题，我们分析了各种解码器组件的作用，包括上采样方法，跳过连接和特征提取模块，以及现有方法的缺点。因此，我们通过具体优化这三个组件的设计，提出了Swin DER（即Swin UNETR Decoder Enhanced and refine）。Swin DER采用可学习的偏移坐标邻域加权上采样（Onsampling）插值算法进行上采样，并用空间通道并行注意门（SCP AG）取代传统的跳变连接。此外，Swin DER在解码器的特征提取模块中引入了可变形卷积和注意机制。我们的模型设计取得了优异的结果，在Synapse数据集和MSD脑肿瘤分割任务上超过了其他最先进的方法。代码可在https://github.com/WillBeanYang/Swin-DER获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical Physics & Engineering Express RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

2.80

自引率

0.00%

发文量

153

期刊介绍： BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.