CollaMamba：利用跨代理时空状态模型实现高效协作感知

arXiv - CS - Multiagent Systems Pub Date : 2024-09-12 DOI:arxiv-2409.07714

Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li

{"title":"CollaMamba：利用跨代理时空状态模型实现高效协作感知","authors":"Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li","doi":"arxiv-2409.07714","DOIUrl":null,"url":null,"abstract":"By sharing complementary perceptual information, multi-agent collaborative\nperception fosters a deeper understanding of the environment. Recent studies on\ncollaborative perception mostly utilize CNNs or Transformers to learn feature\nrepresentation and fusion in the spatial dimension, which struggle to handle\nlong-range spatial-temporal features under limited computing and communication\nresources. Holistically modeling the dependencies over extensive spatial areas\nand extended temporal frames is crucial to enhancing feature quality. To this\nend, we propose a resource efficient cross-agent spatial-temporal collaborative\nstate space model (SSM), named CollaMamba. Initially, we construct a\nfoundational backbone network based on spatial SSM. This backbone adeptly\ncaptures positional causal dependencies from both single-agent and cross-agent\nviews, yielding compact and comprehensive intermediate features while\nmaintaining linear complexity. Furthermore, we devise a history-aware feature\nboosting module based on temporal SSM, extracting contextual cues from extended\nhistorical frames to refine vague features while preserving low overhead.\nExtensive experiments across several datasets demonstrate that CollaMamba\noutperforms state-of-the-art methods, achieving higher model accuracy while\nreducing computational and communication overhead by up to 71.9% and 1/64,\nrespectively. This work pioneers the exploration of the Mamba's potential in\ncollaborative perception. The source code will be made available.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model\",\"authors\":\"Yang Li, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Xuanhan Zhu, Yujia Yang, Rui Pan, Jinglin Li\",\"doi\":\"arxiv-2409.07714\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By sharing complementary perceptual information, multi-agent collaborative\\nperception fosters a deeper understanding of the environment. Recent studies on\\ncollaborative perception mostly utilize CNNs or Transformers to learn feature\\nrepresentation and fusion in the spatial dimension, which struggle to handle\\nlong-range spatial-temporal features under limited computing and communication\\nresources. Holistically modeling the dependencies over extensive spatial areas\\nand extended temporal frames is crucial to enhancing feature quality. To this\\nend, we propose a resource efficient cross-agent spatial-temporal collaborative\\nstate space model (SSM), named CollaMamba. Initially, we construct a\\nfoundational backbone network based on spatial SSM. This backbone adeptly\\ncaptures positional causal dependencies from both single-agent and cross-agent\\nviews, yielding compact and comprehensive intermediate features while\\nmaintaining linear complexity. Furthermore, we devise a history-aware feature\\nboosting module based on temporal SSM, extracting contextual cues from extended\\nhistorical frames to refine vague features while preserving low overhead.\\nExtensive experiments across several datasets demonstrate that CollaMamba\\noutperforms state-of-the-art methods, achieving higher model accuracy while\\nreducing computational and communication overhead by up to 71.9% and 1/64,\\nrespectively. This work pioneers the exploration of the Mamba's potential in\\ncollaborative perception. The source code will be made available.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

通过共享互补的感知信息，多机器人协同感知有助于加深对环境的理解。最近关于协作感知的研究大多利用 CNN 或变换器来学习空间维度的特征表示和融合，但在计算和通信资源有限的情况下，它们很难处理长距离的时空特征。要提高特征质量，就必须对广泛的空间区域和扩展的时间框架的依赖关系进行整体建模。为此，我们提出了一种资源高效的跨代理时空协作状态空间模型（SSM），命名为 CollaMamba。首先，我们构建了基于空间 SSM 的基础骨干网络。该骨干网能从单个代理和跨代理视角巧妙地捕捉位置因果依赖关系，在保持线性复杂性的同时产生紧凑而全面的中间特征。此外，我们还设计了一个基于时间 SSM 的历史感知特征增强模块，从扩展历史帧中提取上下文线索，以完善模糊特征，同时保持较低的开销。在多个数据集上进行的广泛实验表明，CollaMamba 的性能优于最先进的方法，在实现更高的模型准确性的同时，计算和通信开销分别降低了 71.9% 和 1/64。这项工作率先探索了 Mamba 在协作感知方面的潜力。我们将提供源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model

By sharing complementary perceptual information, multi-agent collaborative perception fosters a deeper understanding of the environment. Recent studies on collaborative perception mostly utilize CNNs or Transformers to learn feature representation and fusion in the spatial dimension, which struggle to handle long-range spatial-temporal features under limited computing and communication resources. Holistically modeling the dependencies over extensive spatial areas and extended temporal frames is crucial to enhancing feature quality. To this end, we propose a resource efficient cross-agent spatial-temporal collaborative state space model (SSM), named CollaMamba. Initially, we construct a foundational backbone network based on spatial SSM. This backbone adeptly captures positional causal dependencies from both single-agent and cross-agent views, yielding compact and comprehensive intermediate features while maintaining linear complexity. Furthermore, we devise a history-aware feature boosting module based on temporal SSM, extracting contextual cues from extended historical frames to refine vague features while preserving low overhead. Extensive experiments across several datasets demonstrate that CollaMamba outperforms state-of-the-art methods, achieving higher model accuracy while reducing computational and communication overhead by up to 71.9% and 1/64, respectively. This work pioneers the exploration of the Mamba's potential in collaborative perception. The source code will be made available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量