Mambav3d: A mamba-based virtual 3D module stringing semantic information between layers of medical image slices

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2024-11-15 DOI:10.1016/j.displa.2024.102890

Xiaoxiao Liu, Yan Zhao, Shigang Wang, Jian Wei

{"title":"Mambav3d: A mamba-based virtual 3D module stringing semantic information between layers of medical image slices","authors":"Xiaoxiao Liu, Yan Zhao, Shigang Wang, Jian Wei","doi":"10.1016/j.displa.2024.102890","DOIUrl":null,"url":null,"abstract":"<div><div>High-precision medical image segmentation provides a reliable basis for clinical analysis and diagnosis. Researchers have developed various models to enhance the segmentation performance of medical images. Among these methods, two-dimensional models such as Unet exhibit a simple structure, low computational resource requirements, and strong local feature capture capabilities. However, their spatial information utilization is insufficient, limiting their segmentation accuracy. Three-dimensional models, such as 3D Unet, utilize spatial information more fully and are suitable for complex tasks, but they require high computational resources and have limited real-time performance. In this paper, we propose a virtual 3D module (Mambav3d) based on mamba, which introduces spatial information into 2D segmentation tasks to more fully integrate the 3D information of the image and further improve segmentation accuracy under conditions of low computational resource requirements. Mambav3d leverages the properties of hidden states in the state space model, combined with the shift of visual perspective, to incorporate semantic information between different anatomical planes in different slices of the same 3D sample. The voxel segmentation is converted to pixel segmentation to reduce model training data requirements and model complexity while ensuring that the model integrates 3D information and enhances segmentation accuracy. The model references the information from previous layers when labeling the current layer, thereby facilitating the transfer of semantic information between slice layers and avoiding the high computational cost associated with using structures such as Transformers between layers. We have implemented Mambav3d on Unet and evaluated its performance on the BraTs, Amos, and KiTs datasets, demonstrating superiority over other state-of-the-art methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102890"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224002543","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

High-precision medical image segmentation provides a reliable basis for clinical analysis and diagnosis. Researchers have developed various models to enhance the segmentation performance of medical images. Among these methods, two-dimensional models such as Unet exhibit a simple structure, low computational resource requirements, and strong local feature capture capabilities. However, their spatial information utilization is insufficient, limiting their segmentation accuracy. Three-dimensional models, such as 3D Unet, utilize spatial information more fully and are suitable for complex tasks, but they require high computational resources and have limited real-time performance. In this paper, we propose a virtual 3D module (Mambav3d) based on mamba, which introduces spatial information into 2D segmentation tasks to more fully integrate the 3D information of the image and further improve segmentation accuracy under conditions of low computational resource requirements. Mambav3d leverages the properties of hidden states in the state space model, combined with the shift of visual perspective, to incorporate semantic information between different anatomical planes in different slices of the same 3D sample. The voxel segmentation is converted to pixel segmentation to reduce model training data requirements and model complexity while ensuring that the model integrates 3D information and enhances segmentation accuracy. The model references the information from previous layers when labeling the current layer, thereby facilitating the transfer of semantic information between slice layers and avoiding the high computational cost associated with using structures such as Transformers between layers. We have implemented Mambav3d on Unet and evaluated its performance on the BraTs, Amos, and KiTs datasets, demonstrating superiority over other state-of-the-art methods.

查看原文本刊更多论文

Mambav3d：基于曼巴的虚拟三维模块，在各层医学图像切片之间串联语义信息

高精度医学图像分割为临床分析和诊断提供了可靠的依据。研究人员开发了各种模型来提高医学图像的分割性能。在这些方法中，Unet 等二维模型结构简单、计算资源要求低、局部特征捕捉能力强。但其空间信息利用率不足，限制了其分割精度。三维模型（如三维 Unet）能更充分地利用空间信息，适用于复杂的任务，但对计算资源的要求较高，实时性有限。本文提出了一种基于 mamba 的虚拟三维模块（Mambav3d），将空间信息引入二维分割任务中，从而更充分地整合图像的三维信息，在低计算资源要求的条件下进一步提高分割精度。Mambav3d 利用状态空间模型中隐藏状态的特性，结合视觉视角的偏移，在同一三维样本的不同切片中加入不同解剖平面之间的语义信息。体素分割转换为像素分割，以减少模型训练数据需求和模型复杂度，同时确保模型整合三维信息并提高分割准确性。该模型在标注当前层时参考了前一层的信息，从而促进了切片层之间语义信息的传递，避免了在层与层之间使用变换器等结构所带来的高计算成本。我们在 Unet 上实现了 Mambav3d，并在 BraTs、Amos 和 KiTs 数据集上对其性能进行了评估，结果表明它优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.