PMDFN3D: Pre-mid dual fusion network for 3D object detection

IF 3 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-06-10 DOI:10.1016/j.dsp.2025.105399

Haishun Du , Sen Wang , Wenzhe Zhang , Linbing Cao

{"title":"PMDFN3D: Pre-mid dual fusion network for 3D object detection","authors":"Haishun Du , Sen Wang , Wenzhe Zhang , Linbing Cao","doi":"10.1016/j.dsp.2025.105399","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, multi-modality 3D object detection technology is gradually becoming the mainstream of 3D object detection. In multi-modality 3D object detection, effectively fusing information from point cloud data and image data remains a significant challenge. Existing multi-modality 3D object detection models mainly use one of the pre-, mid- or post-fusion strategies to fuse image data and point cloud data, and each of these fusion strategies has some shortcomings. Currently, integrating multiple fusion strategies into a framework is still a research gap in the field of multi-modality 3D object detection. To fill this gap, we propose a pre-mid dual fusion network for 3D object detection (PMDFN3D), which skillfully integrates the pre-fusion and mid-fusion into a unified framework. Specifically, we first design a depth-guided cross-modality feature fusion module that enables the effective integration of image and point features without requiring complex feature alignment operations. Then, we design a neighboring feature interaction attention module to mitigate the impact of down-sampling operations in the point cloud backbone network on the precision of point features. Finally, we design a simple object-level feature selector and an object-level feature-guided cross-modality feature fusion module, which adaptively integrate image features relevant to the objects with object-level point features. Experimental results on the SUN RGB-D dataset demonstrate that our network has achieved state-of-the-art performance in 3D object detection.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"166 ","pages":"Article 105399"},"PeriodicalIF":3.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S105120042500421X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, multi-modality 3D object detection technology is gradually becoming the mainstream of 3D object detection. In multi-modality 3D object detection, effectively fusing information from point cloud data and image data remains a significant challenge. Existing multi-modality 3D object detection models mainly use one of the pre-, mid- or post-fusion strategies to fuse image data and point cloud data, and each of these fusion strategies has some shortcomings. Currently, integrating multiple fusion strategies into a framework is still a research gap in the field of multi-modality 3D object detection. To fill this gap, we propose a pre-mid dual fusion network for 3D object detection (PMDFN3D), which skillfully integrates the pre-fusion and mid-fusion into a unified framework. Specifically, we first design a depth-guided cross-modality feature fusion module that enables the effective integration of image and point features without requiring complex feature alignment operations. Then, we design a neighboring feature interaction attention module to mitigate the impact of down-sampling operations in the point cloud backbone network on the precision of point features. Finally, we design a simple object-level feature selector and an object-level feature-guided cross-modality feature fusion module, which adaptively integrate image features relevant to the objects with object-level point features. Experimental results on the SUN RGB-D dataset demonstrate that our network has achieved state-of-the-art performance in 3D object detection.

查看原文本刊更多论文

PMDFN3D：用于三维目标检测的中前双融合网络

近年来，多模态三维目标检测技术逐渐成为三维目标检测的主流。在多模态三维目标检测中，有效融合点云数据和图像数据的信息仍然是一个重大挑战。现有的多模态三维目标检测模型主要采用融合前、融合中或融合后三种策略中的一种来融合图像数据和点云数据，每种融合策略都存在一定的不足。目前，将多种融合策略集成到一个框架中，仍然是多模态三维目标检测领域的研究空白。为了填补这一空白，我们提出了一种用于3D目标检测的预中期双重融合网络（PMDFN3D），该网络巧妙地将预融合和中期融合集成到一个统一的框架中。具体而言，我们首先设计了一个深度引导的跨模态特征融合模块，该模块可以在不需要复杂的特征对齐操作的情况下有效地集成图像和点特征。然后，我们设计了一个相邻特征交互关注模块，以减轻点云骨干网下采样操作对点特征精度的影响。最后，设计了简单的目标级特征选择器和目标级特征引导下的跨模态特征融合模块，实现了与目标相关的图像特征与目标级点特征的自适应融合。在SUN RGB-D数据集上的实验结果表明，我们的网络在三维目标检测方面达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,