用于鲁棒医学图像分割的轻量级多级聚合变压器

IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiaoyan Wang , Yating Zhu , Ying Cui , Xiaojie Huang , Dongyan Guo , Pan Mu , Ming Xia , Cong Bai , Zhongzhao Teng , Shengyong Chen
{"title":"用于鲁棒医学图像分割的轻量级多级聚合变压器","authors":"Xiaoyan Wang ,&nbsp;Yating Zhu ,&nbsp;Ying Cui ,&nbsp;Xiaojie Huang ,&nbsp;Dongyan Guo ,&nbsp;Pan Mu ,&nbsp;Ming Xia ,&nbsp;Cong Bai ,&nbsp;Zhongzhao Teng ,&nbsp;Shengyong Chen","doi":"10.1016/j.media.2025.103569","DOIUrl":null,"url":null,"abstract":"<div><div>Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at <span><span>https://github.com/zjmiaprojects/MATransformerV2</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103569"},"PeriodicalIF":10.7000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lightweight Multi-Stage Aggregation Transformer for robust medical image segmentation\",\"authors\":\"Xiaoyan Wang ,&nbsp;Yating Zhu ,&nbsp;Ying Cui ,&nbsp;Xiaojie Huang ,&nbsp;Dongyan Guo ,&nbsp;Pan Mu ,&nbsp;Ming Xia ,&nbsp;Cong Bai ,&nbsp;Zhongzhao Teng ,&nbsp;Shengyong Chen\",\"doi\":\"10.1016/j.media.2025.103569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at <span><span>https://github.com/zjmiaprojects/MATransformerV2</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":18328,\"journal\":{\"name\":\"Medical image analysis\",\"volume\":\"103 \",\"pages\":\"Article 103569\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image analysis\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1361841525001161\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001161","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

捕获丰富的多尺度特征是解决医学图像分割中复杂变化的必要条件。多种混合网络已被开发,以整合卷积神经网络(CNN)和变压器的互补优势。然而,现有的方法要么存在复杂网络所需的巨大计算成本,要么存在较轻网络性能不理想的问题。如何充分发挥卷积和自关注的优势,设计出既有效又高效的网络,仍然是一个有待解决的问题。在这项工作中,我们提出了一种鲁棒轻量级多级混合架构,称为多级聚合变压器版本2 (MA-TransformerV2),以低计算成本提取多尺度特征,以精确分割高度可变的医学图像。具体而言,在双支路编码器模块中,在每个阶段并行引入轻量级Trans块和轻量级CNN块,然后在瓶颈处加入矢量量化块,对特征进行离散化,去除冗余。这种设计不仅提高了模型的表示能力和计算效率,而且使模型具有可解释性。在公共数据集上的大量实验结果表明,我们的方法在分割精度和模型容量方面优于最先进的方法,包括基于cnn的、基于transformer的、基于先进的混合cnn - transformer模型和几种轻量级模型。代码将在https://github.com/zjmiaprojects/MATransformerV2上公开提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Lightweight Multi-Stage Aggregation Transformer for robust medical image segmentation

Lightweight Multi-Stage Aggregation Transformer for robust medical image segmentation
Capturing rich multi-scale features is essential to address complex variations in medical image segmentation. Multiple hybrid networks have been developed to integrate the complementary benefits of convolutional neural networks (CNN) and Transformers. However, existing methods may suffer from either huge computational cost required by the complicated networks or unsatisfied performance of lighter networks. How to give full play to the advantages of both convolution and self-attention and design networks that are both effective and efficient still remains an unsolved problem. In this work, we propose a robust lightweight multi-stage hybrid architecture, named Multi-stage Aggregation Transformer version 2 (MA-TransformerV2), to extract multi-scale features with progressive aggregations for accurate segmentation of highly variable medical images at a low computational cost. Specifically, lightweight Trans blocks and lightweight CNN blocks are parallelly introduced into the dual-branch encoder module in each stage, and then a vector quantization block is incorporated at the bottleneck to discretizes the features and discard the redundance. This design not only enhances the representation capabilities and computational efficiency of the model, but also makes the model interpretable. Extensive experimental results on public datasets show that our method outperforms state-of-the-art methods, including CNN-based, Transformer-based, advanced hybrid CNN-Transformer-based models, and several lightweight models, in terms of both segmentation accuracy and model capacity. Code will be made publicly available at https://github.com/zjmiaprojects/MATransformerV2.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Medical image analysis
Medical image analysis 工程技术-工程:生物医学
CiteScore
22.10
自引率
6.40%
发文量
309
审稿时长
6.6 months
期刊介绍: Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信