Jiayue Xu, Chao Xu, Jianping Zhao, Cheng Han, Hua Li
{"title":"Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation","authors":"Jiayue Xu, Chao Xu, Jianping Zhao, Cheng Han, Hua Li","doi":"10.1016/j.displa.2025.103058","DOIUrl":null,"url":null,"abstract":"<div><div>PAnoramic Semantic Segmentation (PASS) is a significant and challenging task in the field of computer vision, aimed at achieving comprehensive scene understanding through an ultra-wide-angle view. However, the equirectangular projection (ERP) with richer contextual information is susceptible to geometric distortion and spatial discontinuity, which undoubtedly impede the efficacy of PASS. Recently, significant progress has been made in PASS, nevertheless, these methods often face a dilemma between global perception and efficient computation, as well as the effective trade-off between image geometric distortion and spatial discontinuity. To address this, we propose a novel framework for PASS, Mamba4PASS, which is more efficient compared to Transformer-based backbone models. We introduce an Incremental Feature Fusion (IFF) module that gradually integrates semantic features from deeper layers with spatial detail features from shallower layers, effectively alleviating the loss of local details caused by State Space Model (SSM). Additionally, we introduce a Spherical Geometry-Aware Deformable Patch Embedding (SGADPE) module, which leverages spherical geometry properties and employs a novel deformable convolution strategy to adapt to ERPs, effectively addressing spatial discontinuities and stabilizing geometric distortions. To the best of our knowledge, this is the first semantic segmentation model for panoramic images based on the Mamba architecture. We explore the potential of this approach for PASS, providing a new solution to this domain, and validate its effectiveness and advantages. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art results compared to existing approaches.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103058"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000952","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
PAnoramic Semantic Segmentation (PASS) is a significant and challenging task in the field of computer vision, aimed at achieving comprehensive scene understanding through an ultra-wide-angle view. However, the equirectangular projection (ERP) with richer contextual information is susceptible to geometric distortion and spatial discontinuity, which undoubtedly impede the efficacy of PASS. Recently, significant progress has been made in PASS, nevertheless, these methods often face a dilemma between global perception and efficient computation, as well as the effective trade-off between image geometric distortion and spatial discontinuity. To address this, we propose a novel framework for PASS, Mamba4PASS, which is more efficient compared to Transformer-based backbone models. We introduce an Incremental Feature Fusion (IFF) module that gradually integrates semantic features from deeper layers with spatial detail features from shallower layers, effectively alleviating the loss of local details caused by State Space Model (SSM). Additionally, we introduce a Spherical Geometry-Aware Deformable Patch Embedding (SGADPE) module, which leverages spherical geometry properties and employs a novel deformable convolution strategy to adapt to ERPs, effectively addressing spatial discontinuities and stabilizing geometric distortions. To the best of our knowledge, this is the first semantic segmentation model for panoramic images based on the Mamba architecture. We explore the potential of this approach for PASS, providing a new solution to this domain, and validate its effectiveness and advantages. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art results compared to existing approaches.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.