Haishun Du , Sen Wang , Wenzhe Zhang , Linbing Cao
{"title":"PMDFN3D: Pre-mid dual fusion network for 3D object detection","authors":"Haishun Du , Sen Wang , Wenzhe Zhang , Linbing Cao","doi":"10.1016/j.dsp.2025.105399","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, multi-modality 3D object detection technology is gradually becoming the mainstream of 3D object detection. In multi-modality 3D object detection, effectively fusing information from point cloud data and image data remains a significant challenge. Existing multi-modality 3D object detection models mainly use one of the pre-, mid- or post-fusion strategies to fuse image data and point cloud data, and each of these fusion strategies has some shortcomings. Currently, integrating multiple fusion strategies into a framework is still a research gap in the field of multi-modality 3D object detection. To fill this gap, we propose a pre-mid dual fusion network for 3D object detection (PMDFN3D), which skillfully integrates the pre-fusion and mid-fusion into a unified framework. Specifically, we first design a depth-guided cross-modality feature fusion module that enables the effective integration of image and point features without requiring complex feature alignment operations. Then, we design a neighboring feature interaction attention module to mitigate the impact of down-sampling operations in the point cloud backbone network on the precision of point features. Finally, we design a simple object-level feature selector and an object-level feature-guided cross-modality feature fusion module, which adaptively integrate image features relevant to the objects with object-level point features. Experimental results on the SUN RGB-D dataset demonstrate that our network has achieved state-of-the-art performance in 3D object detection.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"166 ","pages":"Article 105399"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S105120042500421X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, multi-modality 3D object detection technology is gradually becoming the mainstream of 3D object detection. In multi-modality 3D object detection, effectively fusing information from point cloud data and image data remains a significant challenge. Existing multi-modality 3D object detection models mainly use one of the pre-, mid- or post-fusion strategies to fuse image data and point cloud data, and each of these fusion strategies has some shortcomings. Currently, integrating multiple fusion strategies into a framework is still a research gap in the field of multi-modality 3D object detection. To fill this gap, we propose a pre-mid dual fusion network for 3D object detection (PMDFN3D), which skillfully integrates the pre-fusion and mid-fusion into a unified framework. Specifically, we first design a depth-guided cross-modality feature fusion module that enables the effective integration of image and point features without requiring complex feature alignment operations. Then, we design a neighboring feature interaction attention module to mitigate the impact of down-sampling operations in the point cloud backbone network on the precision of point features. Finally, we design a simple object-level feature selector and an object-level feature-guided cross-modality feature fusion module, which adaptively integrate image features relevant to the objects with object-level point features. Experimental results on the SUN RGB-D dataset demonstrate that our network has achieved state-of-the-art performance in 3D object detection.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,