Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li
{"title":"CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model","authors":"Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li","doi":"arxiv-2409.07714","DOIUrl":null,"url":null,"abstract":"By sharing complementary perceptual information, multi-agent collaborative\nperception fosters a deeper understanding of the environment. Recent studies on\ncollaborative perception mostly utilize CNNs or Transformers to learn feature\nrepresentation and fusion in the spatial dimension, which struggle to handle\nlong-range spatial-temporal features under limited computing and communication\nresources. Holistically modeling the dependencies over extensive spatial areas\nand extended temporal frames is crucial to enhancing feature quality. To this\nend, we propose a resource efficient cross-agent spatial-temporal collaborative\nstate space model (SSM), named CollaMamba. Initially, we construct a\nfoundational backbone network based on spatial SSM. This backbone adeptly\ncaptures positional causal dependencies from both single-agent and cross-agent\nviews, yielding compact and comprehensive intermediate features while\nmaintaining linear complexity. Furthermore, we devise a history-aware feature\nboosting module based on temporal SSM, extracting contextual cues from extended\nhistorical frames to refine vague features while preserving low overhead.\nExtensive experiments across several datasets demonstrate that CollaMamba\noutperforms state-of-the-art methods, achieving higher model accuracy while\nreducing computational and communication overhead by up to 71.9% and 1/64,\nrespectively. This work pioneers the exploration of the Mamba's potential in\ncollaborative perception. The source code will be made available.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
By sharing complementary perceptual information, multi-agent collaborative
perception fosters a deeper understanding of the environment. Recent studies on
collaborative perception mostly utilize CNNs or Transformers to learn feature
representation and fusion in the spatial dimension, which struggle to handle
long-range spatial-temporal features under limited computing and communication
resources. Holistically modeling the dependencies over extensive spatial areas
and extended temporal frames is crucial to enhancing feature quality. To this
end, we propose a resource efficient cross-agent spatial-temporal collaborative
state space model (SSM), named CollaMamba. Initially, we construct a
foundational backbone network based on spatial SSM. This backbone adeptly
captures positional causal dependencies from both single-agent and cross-agent
views, yielding compact and comprehensive intermediate features while
maintaining linear complexity. Furthermore, we devise a history-aware feature
boosting module based on temporal SSM, extracting contextual cues from extended
historical frames to refine vague features while preserving low overhead.
Extensive experiments across several datasets demonstrate that CollaMamba
outperforms state-of-the-art methods, achieving higher model accuracy while
reducing computational and communication overhead by up to 71.9% and 1/64,
respectively. This work pioneers the exploration of the Mamba's potential in
collaborative perception. The source code will be made available.