用于多代理感知的协作式多模态融合网络

IF 9.4 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Cybernetics Pub Date : 2024-11-18 DOI:10.1109/TCYB.2024.3491756

Lei Zhang;Binglu Wang;Yongqiang Zhao;Yuan Yuan;Tianfei Zhou;Zhijun Li

{"title":"用于多代理感知的协作式多模态融合网络","authors":"Lei Zhang;Binglu Wang;Yongqiang Zhao;Yuan Yuan;Tianfei Zhou;Zhijun Li","doi":"10.1109/TCYB.2024.3491756","DOIUrl":null,"url":null,"abstract":"With the increasing popularity of autonomous driving systems and their applications in complex transportation scenarios, collaborative perception among multiple intelligent agents has become an important research direction. Existing single-agent multimodal fusion approaches are limited by their inability to leverage additional sensory data from nearby agents. In this article, we present the collaborative multimodal fusion network (CMMFNet) for distributed perception in multiagent systems. CMMFNet first extracts modality-specific features from LiDAR point clouds and camera images for each agent using dual-stream neural networks. To overcome the ambiguity in-depth prediction, we introduce a collaborative depth supervision module that projects dense fused point clouds onto image planes to generate more accurate depth ground truths. We then present modality-aware fusion strategies to aggregate homogeneous features across agents while preserving their distinctive properties. To align heterogeneous LiDAR and camera features, we introduce a modality consistency learning method. Finally, a transformer-based fusion module dynamically captures cross-modal correlations to produce a unified representation. Comprehensive evaluations on two extensive multiagent perception datasets, OPV2V and V2XSet, affirm the superiority of CMMFNet in detection performance, establishing a new benchmark in the field.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 1","pages":"486-498"},"PeriodicalIF":9.4000,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Collaborative Multimodal Fusion Network for Multiagent Perception\",\"authors\":\"Lei Zhang;Binglu Wang;Yongqiang Zhao;Yuan Yuan;Tianfei Zhou;Zhijun Li\",\"doi\":\"10.1109/TCYB.2024.3491756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing popularity of autonomous driving systems and their applications in complex transportation scenarios, collaborative perception among multiple intelligent agents has become an important research direction. Existing single-agent multimodal fusion approaches are limited by their inability to leverage additional sensory data from nearby agents. In this article, we present the collaborative multimodal fusion network (CMMFNet) for distributed perception in multiagent systems. CMMFNet first extracts modality-specific features from LiDAR point clouds and camera images for each agent using dual-stream neural networks. To overcome the ambiguity in-depth prediction, we introduce a collaborative depth supervision module that projects dense fused point clouds onto image planes to generate more accurate depth ground truths. We then present modality-aware fusion strategies to aggregate homogeneous features across agents while preserving their distinctive properties. To align heterogeneous LiDAR and camera features, we introduce a modality consistency learning method. Finally, a transformer-based fusion module dynamically captures cross-modal correlations to produce a unified representation. Comprehensive evaluations on two extensive multiagent perception datasets, OPV2V and V2XSet, affirm the superiority of CMMFNet in detection performance, establishing a new benchmark in the field.\",\"PeriodicalId\":13112,\"journal\":{\"name\":\"IEEE Transactions on Cybernetics\",\"volume\":\"55 1\",\"pages\":\"486-498\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2024-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cybernetics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10756193/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10756193/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着自动驾驶系统的日益普及及其在复杂交通场景中的应用，多智能体之间的协同感知已成为一个重要的研究方向。现有的单智能体多模式融合方法由于无法利用来自附近智能体的额外感官数据而受到限制。在本文中，我们提出了用于多智能体系统分布式感知的协同多模态融合网络（CMMFNet）。CMMFNet首先使用双流神经网络从LiDAR点云和每个agent的相机图像中提取模态特定特征。为了克服模糊深度预测，我们引入了一个协同深度监督模块，该模块将密集的融合点云投影到图像平面上，以生成更准确的深度地面真相。然后，我们提出了模态感知融合策略，以聚合跨代理的同质特征，同时保留其独特的属性。为了将异构激光雷达和相机特征对齐，我们引入了一种模态一致性学习方法。最后，基于变压器的融合模块动态捕获跨模态相关性以产生统一的表示。通过对OPV2V和V2XSet两个广泛的多智能体感知数据集的综合评价，肯定了CMMFNet在检测性能上的优势，为该领域建立了新的标杆。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Collaborative Multimodal Fusion Network for Multiagent Perception

With the increasing popularity of autonomous driving systems and their applications in complex transportation scenarios, collaborative perception among multiple intelligent agents has become an important research direction. Existing single-agent multimodal fusion approaches are limited by their inability to leverage additional sensory data from nearby agents. In this article, we present the collaborative multimodal fusion network (CMMFNet) for distributed perception in multiagent systems. CMMFNet first extracts modality-specific features from LiDAR point clouds and camera images for each agent using dual-stream neural networks. To overcome the ambiguity in-depth prediction, we introduce a collaborative depth supervision module that projects dense fused point clouds onto image planes to generate more accurate depth ground truths. We then present modality-aware fusion strategies to aggregate homogeneous features across agents while preserving their distinctive properties. To align heterogeneous LiDAR and camera features, we introduce a modality consistency learning method. Finally, a transformer-based fusion module dynamically captures cross-modal correlations to produce a unified representation. Comprehensive evaluations on two extensive multiagent perception datasets, OPV2V and V2XSet, affirm the superiority of CMMFNet in detection performance, establishing a new benchmark in the field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

25.40

自引率

11.00%

发文量

1869

期刊介绍： The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.