无锚三维目标检测的多模态特征自适应融合

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-04-02 DOI:10.1007/s10489-025-06454-w

Yanli Wu, Junyin Wang, Hui Li, Xiaoxue Ai, Xiao Li

{"title":"无锚三维目标检测的多模态特征自适应融合","authors":"Yanli Wu, Junyin Wang, Hui Li, Xiaoxue Ai, Xiao Li","doi":"10.1007/s10489-025-06454-w","DOIUrl":null,"url":null,"abstract":"<div><p>LiDAR and camera are two key sensors that provide mutually complementary information for 3D detection in autonomous driving. Existing multimodal detection methods often decorate the original point cloud data with camera features to complete the detection, ignoring the mutual fusion between camera features and point cloud features. In addition, ground points scanned by LiDAR in natural scenes usually interfere significantly with the detection results, and existing methods fail to address this problem effectively. We present a simple yet efficient anchor-free 3D object detection, which can better adapt to complex scenes through the adaptive fusion of multimodal features. First, we propose a fully convolutional bird’s-eye view reconstruction module to sense ground map geometry changes, for improving the interference of ground points on detection results. Second, a multimodal feature adaptive fusion module with local awareness is designed to improve the mutual fusion of camera and point cloud features. Finally, we introduce a scale-aware mini feature pyramid networks (Mini-FPN) that can directly regress 3D bounding boxes from the augmented dense feature maps, boosting the network’s ability to detect scale-varying objects, and we additionally construct a scene-adaptive single-stage 3D detector in an anchor-free manner. Extensive experiments on the KITTI and nuScenes datasets validate our method’s competitive performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal feature adaptive fusion for anchor-free 3D object detection\",\"authors\":\"Yanli Wu, Junyin Wang, Hui Li, Xiaoxue Ai, Xiao Li\",\"doi\":\"10.1007/s10489-025-06454-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>LiDAR and camera are two key sensors that provide mutually complementary information for 3D detection in autonomous driving. Existing multimodal detection methods often decorate the original point cloud data with camera features to complete the detection, ignoring the mutual fusion between camera features and point cloud features. In addition, ground points scanned by LiDAR in natural scenes usually interfere significantly with the detection results, and existing methods fail to address this problem effectively. We present a simple yet efficient anchor-free 3D object detection, which can better adapt to complex scenes through the adaptive fusion of multimodal features. First, we propose a fully convolutional bird’s-eye view reconstruction module to sense ground map geometry changes, for improving the interference of ground points on detection results. Second, a multimodal feature adaptive fusion module with local awareness is designed to improve the mutual fusion of camera and point cloud features. Finally, we introduce a scale-aware mini feature pyramid networks (Mini-FPN) that can directly regress 3D bounding boxes from the augmented dense feature maps, boosting the network’s ability to detect scale-varying objects, and we additionally construct a scene-adaptive single-stage 3D detector in an anchor-free manner. Extensive experiments on the KITTI and nuScenes datasets validate our method’s competitive performance.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 7\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-025-06454-w\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06454-w","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

激光雷达（LiDAR）和摄像头是自动驾驶技术中相互补充信息的两种关键传感器。现有的多模态检测方法往往在原始点云数据上装饰相机特征来完成检测，忽略了相机特征与点云特征之间的相互融合。此外，激光雷达在自然场景中扫描的地面点通常会对探测结果产生较大干扰，现有方法无法有效解决这一问题。提出了一种简单而高效的无锚三维目标检测方法，通过多模态特征的自适应融合，可以更好地适应复杂场景。首先，我们提出了一个全卷积鸟瞰图重建模块来感知地面地图几何变化，以改善地面点对检测结果的干扰。其次，设计了具有局部感知的多模态特征自适应融合模块，改进了相机特征与点云特征的相互融合；最后，我们引入了一个尺度感知的迷你特征金字塔网络（mini - fpn），它可以直接从增强的密集特征映射中回归3D边界框，提高网络检测尺度变化物体的能力，并且我们还以无锚点的方式构建了一个场景自适应的单级3D探测器。在KITTI和nuScenes数据集上的大量实验验证了我们的方法的竞争性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multimodal feature adaptive fusion for anchor-free 3D object detection

查看原文本刊更多论文

Multimodal feature adaptive fusion for anchor-free 3D object detection

LiDAR and camera are two key sensors that provide mutually complementary information for 3D detection in autonomous driving. Existing multimodal detection methods often decorate the original point cloud data with camera features to complete the detection, ignoring the mutual fusion between camera features and point cloud features. In addition, ground points scanned by LiDAR in natural scenes usually interfere significantly with the detection results, and existing methods fail to address this problem effectively. We present a simple yet efficient anchor-free 3D object detection, which can better adapt to complex scenes through the adaptive fusion of multimodal features. First, we propose a fully convolutional bird’s-eye view reconstruction module to sense ground map geometry changes, for improving the interference of ground points on detection results. Second, a multimodal feature adaptive fusion module with local awareness is designed to improve the mutual fusion of camera and point cloud features. Finally, we introduce a scale-aware mini feature pyramid networks (Mini-FPN) that can directly regress 3D bounding boxes from the augmented dense feature maps, boosting the network’s ability to detect scale-varying objects, and we additionally construct a scene-adaptive single-stage 3D detector in an anchor-free manner. Extensive experiments on the KITTI and nuScenes datasets validate our method’s competitive performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.