SAMFNet: Scene-aware sampling and multi-stage fusion for multimodal 3D object detection

IF 6.2 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Baotong Wang , Chenxing Xia , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Yan Zhang , Yuan Yang
{"title":"SAMFNet: Scene-aware sampling and multi-stage fusion for multimodal 3D object detection","authors":"Baotong Wang ,&nbsp;Chenxing Xia ,&nbsp;Xiuju Gao ,&nbsp;Bin Ge ,&nbsp;Kuan-Ching Li ,&nbsp;Xianjin Fang ,&nbsp;Yan Zhang ,&nbsp;Yuan Yang","doi":"10.1016/j.aej.2025.03.129","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, multimodal 3D object detection (M3OD) that fuses the complementary information from LiDAR data and RGB images has gained significant attention. However, the inherent structural differences between point clouds and images pose fusion challenges, significantly hindering the exploration of correlations within multimodal data. To address this issue, this paper introduces an enhanced multimodal 3D object detection framework (SAMFNet), which leverages virtual point clouds generated from depth completion. Specifically, we design a scene-aware sampling module (SASM) that employs tailored sampling strategies for different bins based on the density distribution of point clouds. This effectively alleviates the detection bias problem while ensuring the key information of virtual points, significantly reducing the computational cost. In addition, we introduce a multi-stage feature fusion module (MSFFM) that embeds point-level and regional-adaptive feature fusion strategies to generate more informative multimodal features by fusing features with different granularities. To further improve the accuracy of model detection, we also introduce a confidence prediction branch unit (CPBU), which improves the detection accuracy by predicting the confidence of feature classification in the intermediate stage. Extensive experiments on the challenging KITTI dataset demonstrate the validity of our model.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"126 ","pages":"Pages 90-104"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825004375","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, multimodal 3D object detection (M3OD) that fuses the complementary information from LiDAR data and RGB images has gained significant attention. However, the inherent structural differences between point clouds and images pose fusion challenges, significantly hindering the exploration of correlations within multimodal data. To address this issue, this paper introduces an enhanced multimodal 3D object detection framework (SAMFNet), which leverages virtual point clouds generated from depth completion. Specifically, we design a scene-aware sampling module (SASM) that employs tailored sampling strategies for different bins based on the density distribution of point clouds. This effectively alleviates the detection bias problem while ensuring the key information of virtual points, significantly reducing the computational cost. In addition, we introduce a multi-stage feature fusion module (MSFFM) that embeds point-level and regional-adaptive feature fusion strategies to generate more informative multimodal features by fusing features with different granularities. To further improve the accuracy of model detection, we also introduce a confidence prediction branch unit (CPBU), which improves the detection accuracy by predicting the confidence of feature classification in the intermediate stage. Extensive experiments on the challenging KITTI dataset demonstrate the validity of our model.
SAMFNet:多模态三维物体检测的场景感知采样和多阶段融合
近年来,融合激光雷达数据和RGB图像互补信息的多模态三维目标检测(M3OD)得到了广泛关注。然而,点云和图像之间固有的结构差异给融合带来了挑战,严重阻碍了对多模态数据中相关性的探索。为了解决这一问题,本文引入了一种增强型多模态3D目标检测框架(SAMFNet),该框架利用深度补全生成的虚拟点云。具体来说,我们设计了一个场景感知采样模块(SASM),该模块根据点云的密度分布为不同的箱子采用定制的采样策略。在保证虚拟点关键信息的同时,有效缓解了检测偏差问题,显著降低了计算成本。此外,我们还引入了一种多阶段特征融合模块(MSFFM),该模块嵌入点级和区域自适应特征融合策略,通过融合不同粒度的特征来生成更多信息丰富的多模态特征。为了进一步提高模型检测的准确性,我们还引入了置信预测分支单元(CPBU),该分支单元通过预测中间阶段特征分类的置信度来提高模型检测的准确性。在具有挑战性的KITTI数据集上进行的大量实验证明了我们模型的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
alexandria engineering journal
alexandria engineering journal Engineering-General Engineering
CiteScore
11.20
自引率
4.40%
发文量
1015
审稿时长
43 days
期刊介绍: Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信