SAMFNet: Scene-aware sampling and multi-stage fusion for multimodal 3D object detection

IF 6.2 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

alexandria engineering journal Pub Date : 2025-04-26 DOI:10.1016/j.aej.2025.03.129

Baotong Wang , Chenxing Xia , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Yan Zhang , Yuan Yang

{"title":"SAMFNet: Scene-aware sampling and multi-stage fusion for multimodal 3D object detection","authors":"Baotong Wang , Chenxing Xia , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Yan Zhang , Yuan Yang","doi":"10.1016/j.aej.2025.03.129","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, multimodal 3D object detection (M3OD) that fuses the complementary information from LiDAR data and RGB images has gained significant attention. However, the inherent structural differences between point clouds and images pose fusion challenges, significantly hindering the exploration of correlations within multimodal data. To address this issue, this paper introduces an enhanced multimodal 3D object detection framework (SAMFNet), which leverages virtual point clouds generated from depth completion. Specifically, we design a scene-aware sampling module (SASM) that employs tailored sampling strategies for different bins based on the density distribution of point clouds. This effectively alleviates the detection bias problem while ensuring the key information of virtual points, significantly reducing the computational cost. In addition, we introduce a multi-stage feature fusion module (MSFFM) that embeds point-level and regional-adaptive feature fusion strategies to generate more informative multimodal features by fusing features with different granularities. To further improve the accuracy of model detection, we also introduce a confidence prediction branch unit (CPBU), which improves the detection accuracy by predicting the confidence of feature classification in the intermediate stage. Extensive experiments on the challenging KITTI dataset demonstrate the validity of our model.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"126 ","pages":"Pages 90-104"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825004375","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, multimodal 3D object detection (M3OD) that fuses the complementary information from LiDAR data and RGB images has gained significant attention. However, the inherent structural differences between point clouds and images pose fusion challenges, significantly hindering the exploration of correlations within multimodal data. To address this issue, this paper introduces an enhanced multimodal 3D object detection framework (SAMFNet), which leverages virtual point clouds generated from depth completion. Specifically, we design a scene-aware sampling module (SASM) that employs tailored sampling strategies for different bins based on the density distribution of point clouds. This effectively alleviates the detection bias problem while ensuring the key information of virtual points, significantly reducing the computational cost. In addition, we introduce a multi-stage feature fusion module (MSFFM) that embeds point-level and regional-adaptive feature fusion strategies to generate more informative multimodal features by fusing features with different granularities. To further improve the accuracy of model detection, we also introduce a confidence prediction branch unit (CPBU), which improves the detection accuracy by predicting the confidence of feature classification in the intermediate stage. Extensive experiments on the challenging KITTI dataset demonstrate the validity of our model.

查看原文本刊更多论文

SAMFNet：多模态三维物体检测的场景感知采样和多阶段融合

近年来，融合激光雷达数据和RGB图像互补信息的多模态三维目标检测（M3OD）得到了广泛关注。然而，点云和图像之间固有的结构差异给融合带来了挑战，严重阻碍了对多模态数据中相关性的探索。为了解决这一问题，本文引入了一种增强型多模态3D目标检测框架（SAMFNet），该框架利用深度补全生成的虚拟点云。具体来说，我们设计了一个场景感知采样模块（SASM），该模块根据点云的密度分布为不同的箱子采用定制的采样策略。在保证虚拟点关键信息的同时，有效缓解了检测偏差问题，显著降低了计算成本。此外，我们还引入了一种多阶段特征融合模块（MSFFM），该模块嵌入点级和区域自适应特征融合策略，通过融合不同粒度的特征来生成更多信息丰富的多模态特征。为了进一步提高模型检测的准确性，我们还引入了置信预测分支单元（CPBU），该分支单元通过预测中间阶段特征分类的置信度来提高模型检测的准确性。在具有挑战性的KITTI数据集上进行的大量实验证明了我们模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

alexandria engineering journal Engineering-General Engineering

CiteScore

11.20

自引率

4.40%

发文量

1015

审稿时长

43 days

期刊介绍： Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering