Boosting Convolution With Efficient MLP-Permutation for Volumetric Medical Image Segmentation

IEEE transactions on medical imaging Pub Date : 2025-01-16 DOI:10.1109/TMI.2025.3530113

Yi Lin;Xiao Fang;Dong Zhang;Kwang-Ting Cheng;Hao Chen

{"title":"Boosting Convolution With Efficient MLP-Permutation for Volumetric Medical Image Segmentation","authors":"Yi Lin;Xiao Fang;Dong Zhang;Kwang-Ting Cheng;Hao Chen","doi":"10.1109/TMI.2025.3530113","DOIUrl":null,"url":null,"abstract":"Recently, the advent of Vision Transformer (ViT) has brought substantial advancements in 3D benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic anisotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperformed the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20, Synapse, LiTS and MSD BraTS benchmarks. The ablation study also demonstrated the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP. The code is available on Github: <uri>https://github.com/xiaofang007/PHNet</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 5","pages":"2341-2352"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical imaging","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10843792/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, the advent of Vision Transformer (ViT) has brought substantial advancements in 3D benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic anisotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperformed the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20, Synapse, LiTS and MSD BraTS benchmarks. The ablation study also demonstrated the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP. The code is available on Github: https://github.com/xiaofang007/PHNet.

查看原文本刊更多论文

基于高效mlp置换增强卷积的体积医学图像分割

最近，视觉变压器（ViT）的出现带来了3D基准测试的实质性进展，特别是在3D体医学图像分割（Vol-MedSeg）方面。同时，多层感知器（MLP）网络由于其可与ViT相媲美的结果而重新受到研究人员的欢迎，尽管排除了资源密集型的自关注模块。在这项工作中，我们为Vol-MedSeg提出了一种新的permutable混合网络，名为PHNet，它利用了卷积神经网络（cnn）和MLP的优势。PHNet通过结合2D和3D cnn来提取局部特征，解决了3D体积数据的固有各向异性问题。此外，我们提出了一种高效的多层置换感知器（MLPP）模块，该模块在保留位置信息的同时捕获远程依赖关系。这是通过轴分解操作实现的，该操作沿着不同的轴排列输入张量，从而实现位置信息的单独编码。此外，MLPP通过令牌分割操作解决了Vol-MedSeg中MLP的分辨率敏感性问题，该操作将特征划分为更小的令牌并单独处理。大量的实验结果证实，PHNet在广泛使用但具有挑战性的COVID-19-20、Synapse、LiTS和MSD BraTS基准测试中，以更低的计算成本优于最先进的方法。消融研究也证明了PHNet在利用cnn和MLP的优势方面的有效性。代码可在Github上获得：https://github.com/xiaofang007/PHNet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on medical imaging

自引率

0.00%

发文量