{"title":"Mambav3d: A mamba-based virtual 3D module stringing semantic information between layers of medical image slices","authors":"Xiaoxiao Liu, Yan Zhao, Shigang Wang, Jian Wei","doi":"10.1016/j.displa.2024.102890","DOIUrl":null,"url":null,"abstract":"<div><div>High-precision medical image segmentation provides a reliable basis for clinical analysis and diagnosis. Researchers have developed various models to enhance the segmentation performance of medical images. Among these methods, two-dimensional models such as Unet exhibit a simple structure, low computational resource requirements, and strong local feature capture capabilities. However, their spatial information utilization is insufficient, limiting their segmentation accuracy. Three-dimensional models, such as 3D Unet, utilize spatial information more fully and are suitable for complex tasks, but they require high computational resources and have limited real-time performance. In this paper, we propose a virtual 3D module (Mambav3d) based on mamba, which introduces spatial information into 2D segmentation tasks to more fully integrate the 3D information of the image and further improve segmentation accuracy under conditions of low computational resource requirements. Mambav3d leverages the properties of hidden states in the state space model, combined with the shift of visual perspective, to incorporate semantic information between different anatomical planes in different slices of the same 3D sample. The voxel segmentation is converted to pixel segmentation to reduce model training data requirements and model complexity while ensuring that the model integrates 3D information and enhances segmentation accuracy. The model references the information from previous layers when labeling the current layer, thereby facilitating the transfer of semantic information between slice layers and avoiding the high computational cost associated with using structures such as Transformers between layers. We have implemented Mambav3d on Unet and evaluated its performance on the BraTs, Amos, and KiTs datasets, demonstrating superiority over other state-of-the-art methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102890"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224002543","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
High-precision medical image segmentation provides a reliable basis for clinical analysis and diagnosis. Researchers have developed various models to enhance the segmentation performance of medical images. Among these methods, two-dimensional models such as Unet exhibit a simple structure, low computational resource requirements, and strong local feature capture capabilities. However, their spatial information utilization is insufficient, limiting their segmentation accuracy. Three-dimensional models, such as 3D Unet, utilize spatial information more fully and are suitable for complex tasks, but they require high computational resources and have limited real-time performance. In this paper, we propose a virtual 3D module (Mambav3d) based on mamba, which introduces spatial information into 2D segmentation tasks to more fully integrate the 3D information of the image and further improve segmentation accuracy under conditions of low computational resource requirements. Mambav3d leverages the properties of hidden states in the state space model, combined with the shift of visual perspective, to incorporate semantic information between different anatomical planes in different slices of the same 3D sample. The voxel segmentation is converted to pixel segmentation to reduce model training data requirements and model complexity while ensuring that the model integrates 3D information and enhances segmentation accuracy. The model references the information from previous layers when labeling the current layer, thereby facilitating the transfer of semantic information between slice layers and avoiding the high computational cost associated with using structures such as Transformers between layers. We have implemented Mambav3d on Unet and evaluated its performance on the BraTs, Amos, and KiTs datasets, demonstrating superiority over other state-of-the-art methods.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.