Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-04-23 DOI:10.1016/j.displa.2025.103058

Jiayue Xu, Chao Xu, Jianping Zhao, Cheng Han, Hua Li

{"title":"Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation","authors":"Jiayue Xu, Chao Xu, Jianping Zhao, Cheng Han, Hua Li","doi":"10.1016/j.displa.2025.103058","DOIUrl":null,"url":null,"abstract":"<div><div>PAnoramic Semantic Segmentation (PASS) is a significant and challenging task in the field of computer vision, aimed at achieving comprehensive scene understanding through an ultra-wide-angle view. However, the equirectangular projection (ERP) with richer contextual information is susceptible to geometric distortion and spatial discontinuity, which undoubtedly impede the efficacy of PASS. Recently, significant progress has been made in PASS, nevertheless, these methods often face a dilemma between global perception and efficient computation, as well as the effective trade-off between image geometric distortion and spatial discontinuity. To address this, we propose a novel framework for PASS, Mamba4PASS, which is more efficient compared to Transformer-based backbone models. We introduce an Incremental Feature Fusion (IFF) module that gradually integrates semantic features from deeper layers with spatial detail features from shallower layers, effectively alleviating the loss of local details caused by State Space Model (SSM). Additionally, we introduce a Spherical Geometry-Aware Deformable Patch Embedding (SGADPE) module, which leverages spherical geometry properties and employs a novel deformable convolution strategy to adapt to ERPs, effectively addressing spatial discontinuities and stabilizing geometric distortions. To the best of our knowledge, this is the first semantic segmentation model for panoramic images based on the Mamba architecture. We explore the potential of this approach for PASS, providing a new solution to this domain, and validate its effectiveness and advantages. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art results compared to existing approaches.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103058"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000952","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

PAnoramic Semantic Segmentation (PASS) is a significant and challenging task in the field of computer vision, aimed at achieving comprehensive scene understanding through an ultra-wide-angle view. However, the equirectangular projection (ERP) with richer contextual information is susceptible to geometric distortion and spatial discontinuity, which undoubtedly impede the efficacy of PASS. Recently, significant progress has been made in PASS, nevertheless, these methods often face a dilemma between global perception and efficient computation, as well as the effective trade-off between image geometric distortion and spatial discontinuity. To address this, we propose a novel framework for PASS, Mamba4PASS, which is more efficient compared to Transformer-based backbone models. We introduce an Incremental Feature Fusion (IFF) module that gradually integrates semantic features from deeper layers with spatial detail features from shallower layers, effectively alleviating the loss of local details caused by State Space Model (SSM). Additionally, we introduce a Spherical Geometry-Aware Deformable Patch Embedding (SGADPE) module, which leverages spherical geometry properties and employs a novel deformable convolution strategy to adapt to ERPs, effectively addressing spatial discontinuities and stabilizing geometric distortions. To the best of our knowledge, this is the first semantic segmentation model for panoramic images based on the Mamba architecture. We explore the potential of this approach for PASS, providing a new solution to this domain, and validate its effectiveness and advantages. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art results compared to existing approaches.

查看原文本刊更多论文

Mamba4PASS：全景语义分割的视觉曼巴

全景语义分割（全景语义分割）是计算机视觉领域的一项重要且具有挑战性的任务，旨在通过超广角视图实现全面的场景理解。然而，背景信息丰富的等矩形投影（ERP）容易受到几何畸变和空间不连续的影响，这无疑阻碍了PASS的有效性。近年来，在PASS方面取得了重大进展，但这些方法往往面临全局感知与高效计算之间的困境，以及图像几何畸变与空间不连续之间的有效权衡。为了解决这个问题，我们提出了一个新的PASS框架Mamba4PASS，它比基于transformer的骨干模型更有效。引入增量特征融合（Incremental Feature Fusion， IFF）模块，将较深层的语义特征与较浅层的空间细节特征逐步融合，有效缓解了状态空间模型（SSM）造成的局部细节丢失。此外，我们还引入了球面几何感知的可变形补丁嵌入（SGADPE）模块，该模块利用球面几何特性并采用新颖的可变形卷积策略来适应erp，有效地解决了空间不连续问题并稳定了几何扭曲。据我们所知，这是第一个基于曼巴架构的全景图像语义分割模型。我们探索了这种方法在PASS领域的潜力，为该领域提供了一种新的解决方案，并验证了其有效性和优势。大量的实验证明了所提出方法的有效性，与现有方法相比，获得了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.