{"title":"PolarFusion: A multi-modal fusion algorithm for 3D object detection based on polar coordinates.","authors":"Peicheng Shi, Runshuai Ge, Xinlong Dong, Chadia Chakir, Taonian Liang, Aixi Yang","doi":"10.1016/j.neunet.2025.107704","DOIUrl":null,"url":null,"abstract":"<p><p>Existing 3D object detection algorithms that fuse multi-modal sensor information typically operate in Cartesian coordinates, which can lead to asymmetrical feature information and uneven attention across multiple views. To address this, we propose PolarFusion, the first multi-modal fusion BEV object detection algorithm based on polar coordinates. We designed three specialized modules for this approach: the Polar Region Candidates Generation Module, the Polar Region Query Generation Module, and the Polar Region Information Fusion Module. In the Polar Region Candidates Generation Module, we use a region proposal-based segmentation method to remove irrelevant areas from images, enhancing PolarFusion's information processing efficiency. These segmented image regions are then integrated into the point cloud segmentation task, addressing feature misalignment during fusion. The Polar Region Query Generation Module leverages prior information to generate high-quality target queries, reducing the time spent learning from initialization. For the Polar Region Information Fusion Module, PolarFusion employs a simple yet efficient self-attention to merge internal information from images and point clouds. This captures long-range dependencies in image texture information while preserving the precise positional data from point clouds, enabling more accurate BEV object detection. We conducted extensive experiments on challenging BEV object detection datasets. Both qualitative and quantitative results demonstrate that PolarFusion achieves an NDS of 76.1% and mAP of 74.5% on the nuScenes test set, significantly outperforming Cartesian-based methods. This advancement enhances the environmental perception capabilities of autonomous vehicles and contributes to the development of future intelligent transportation systems. The code will be released at https://github.com/RunshuaiGe/PolarFusion.git.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"107704"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2025.107704","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Existing 3D object detection algorithms that fuse multi-modal sensor information typically operate in Cartesian coordinates, which can lead to asymmetrical feature information and uneven attention across multiple views. To address this, we propose PolarFusion, the first multi-modal fusion BEV object detection algorithm based on polar coordinates. We designed three specialized modules for this approach: the Polar Region Candidates Generation Module, the Polar Region Query Generation Module, and the Polar Region Information Fusion Module. In the Polar Region Candidates Generation Module, we use a region proposal-based segmentation method to remove irrelevant areas from images, enhancing PolarFusion's information processing efficiency. These segmented image regions are then integrated into the point cloud segmentation task, addressing feature misalignment during fusion. The Polar Region Query Generation Module leverages prior information to generate high-quality target queries, reducing the time spent learning from initialization. For the Polar Region Information Fusion Module, PolarFusion employs a simple yet efficient self-attention to merge internal information from images and point clouds. This captures long-range dependencies in image texture information while preserving the precise positional data from point clouds, enabling more accurate BEV object detection. We conducted extensive experiments on challenging BEV object detection datasets. Both qualitative and quantitative results demonstrate that PolarFusion achieves an NDS of 76.1% and mAP of 74.5% on the nuScenes test set, significantly outperforming Cartesian-based methods. This advancement enhances the environmental perception capabilities of autonomous vehicles and contributes to the development of future intelligent transportation systems. The code will be released at https://github.com/RunshuaiGe/PolarFusion.git.
现有的融合多模态传感器信息的3D目标检测算法通常在笛卡尔坐标下运行,这可能导致多个视图之间的特征信息不对称和注意力不均匀。为了解决这个问题,我们提出了PolarFusion,这是第一个基于极坐标的多模态融合BEV目标检测算法。我们为此设计了三个专门的模块:极地候选区域生成模块、极地区域查询生成模块和极地区域信息融合模块。在极区候选区域生成模块中,我们采用基于区域建议的分割方法去除图像中的不相关区域,提高了极化融合的信息处理效率。然后将这些分割的图像区域集成到点云分割任务中,解决融合过程中的特征不对齐问题。Polar Region Query Generation Module利用先验信息生成高质量的目标查询,减少了从初始化开始学习所花费的时间。对于极区信息融合模块,PolarFusion采用简单而高效的自关注来合并图像和点云的内部信息。这可以捕获图像纹理信息中的远程依赖关系,同时保留来自点云的精确位置数据,从而实现更准确的BEV目标检测。我们在具有挑战性的BEV目标检测数据集上进行了广泛的实验。定性和定量结果均表明,在nuScenes测试集上,极化融合的NDS为76.1%,mAP为74.5%,显著优于基于笛卡尔的方法。这一进展提高了自动驾驶汽车的环境感知能力,有助于未来智能交通系统的发展。代码将在https://github.com/RunshuaiGe/PolarFusion.git上发布。
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.