Boosting Convolution With Efficient MLP-Permutation for Volumetric Medical Image Segmentation

Yi Lin;Xiao Fang;Dong Zhang;Kwang-Ting Cheng;Hao Chen
{"title":"Boosting Convolution With Efficient MLP-Permutation for Volumetric Medical Image Segmentation","authors":"Yi Lin;Xiao Fang;Dong Zhang;Kwang-Ting Cheng;Hao Chen","doi":"10.1109/TMI.2025.3530113","DOIUrl":null,"url":null,"abstract":"Recently, the advent of Vision Transformer (ViT) has brought substantial advancements in 3D benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic anisotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperformed the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20, Synapse, LiTS and MSD BraTS benchmarks. The ablation study also demonstrated the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP. The code is available on Github: <uri>https://github.com/xiaofang007/PHNet</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 5","pages":"2341-2352"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10843792/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, the advent of Vision Transformer (ViT) has brought substantial advancements in 3D benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic anisotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperformed the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20, Synapse, LiTS and MSD BraTS benchmarks. The ablation study also demonstrated the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP. The code is available on Github: https://github.com/xiaofang007/PHNet.
基于高效mlp置换增强卷积的体积医学图像分割
最近,视觉变压器(ViT)的出现带来了3D基准测试的实质性进展,特别是在3D体医学图像分割(Vol-MedSeg)方面。同时,多层感知器(MLP)网络由于其可与ViT相媲美的结果而重新受到研究人员的欢迎,尽管排除了资源密集型的自关注模块。在这项工作中,我们为Vol-MedSeg提出了一种新的permutable混合网络,名为PHNet,它利用了卷积神经网络(cnn)和MLP的优势。PHNet通过结合2D和3D cnn来提取局部特征,解决了3D体积数据的固有各向异性问题。此外,我们提出了一种高效的多层置换感知器(MLPP)模块,该模块在保留位置信息的同时捕获远程依赖关系。这是通过轴分解操作实现的,该操作沿着不同的轴排列输入张量,从而实现位置信息的单独编码。此外,MLPP通过令牌分割操作解决了Vol-MedSeg中MLP的分辨率敏感性问题,该操作将特征划分为更小的令牌并单独处理。大量的实验结果证实,PHNet在广泛使用但具有挑战性的COVID-19-20、Synapse、LiTS和MSD BraTS基准测试中,以更低的计算成本优于最先进的方法。消融研究也证明了PHNet在利用cnn和MLP的优势方面的有效性。代码可在Github上获得:https://github.com/xiaofang007/PHNet。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信