Xiaoyan Wang , Yating Zhu , Ying Cui , Xiaojie Huang , Dongyan Guo , Pan Mu , Ming Xia , Cong Bai , Zhongzhao Teng , Shengyong Chen
{"title":"用于鲁棒医学图像分割的轻量级多级聚合变压器","authors":"Xiaoyan Wang , Yating Zhu , Ying Cui , Xiaojie Huang , Dongyan Guo , Pan Mu , Ming Xia , Cong Bai , Zhongzhao Teng , Shengyong Chen","doi":"10.1016/j.media.2025.103569","DOIUrl":null,"url":null,"abstract":"<div><div>Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at <span><span>https://github.com/zjmiaprojects/MATransformerV2</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103569"},"PeriodicalIF":10.7000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lightweight Multi-Stage Aggregation Transformer for robust medical image segmentation\",\"authors\":\"Xiaoyan Wang , Yating Zhu , Ying Cui , Xiaojie Huang , Dongyan Guo , Pan Mu , Ming Xia , Cong Bai , Zhongzhao Teng , Shengyong Chen\",\"doi\":\"10.1016/j.media.2025.103569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at <span><span>https://github.com/zjmiaprojects/MATransformerV2</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":18328,\"journal\":{\"name\":\"Medical image analysis\",\"volume\":\"103 \",\"pages\":\"Article 103569\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image analysis\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1361841525001161\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001161","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Lightweight Multi-Stage Aggregation Transformer for robust medical image segmentation
Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at https://github.com/zjmiaprojects/MATransformerV2.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.