Tao Ye;Ruohan Liu;Chengzu Min;Yuliang Li;Xiaosong Li
{"title":"DFNet: A Dual LiDAR–Camera Fusion 3-D Object Detection Network Under Feature Degradation Condition","authors":"Tao Ye;Ruohan Liu;Chengzu Min;Yuliang Li;Xiaosong Li","doi":"10.1109/JSEN.2025.3551149","DOIUrl":null,"url":null,"abstract":"LiDAR-camera fusion is widely used in 3-D perception tasks. In LiDAR and camera sensing tasks, the hierarchical feature abstraction capability possessed by the deep network is beneficial to capture the detailed information from point clouds and RGB images. However, it tends to filter some of the information to extract important features, where the problem of feature degradation due to loss of useful information is inevitable. The deterioration of LiDAR-camera fusion due to feature degradation, brought about by this factor, becomes a challenging problem. It reduces object recognition and leads to decreased detection accuracy. To address this problem, we propose a dual LiDAR-camera fusion network (DFNet) based on cross-modal compensation and feature enhancement. We design a multimodal feature extraction (MFE) module to complement the sparse features of the point cloud utilizing image features and focusing on the spatial information of the features. Then, we introduce a multiscale feature aggregation (MFA) module to generate bird’s-eye view (BEV) representations of the features, which generates feature proposals that are then input to the voxel-grid aggregation (VGA) module to obtain the grid-pooled features. Meanwhile, the VGA module receives the feature proposals extracted from the image backbone and projects the point cloud through voxels to obtain voxel-fused features. Finally, we aggregate the grid-pooled features and voxel-fused features to produce more informative fused features. The results on the KITTI dataset illustrate that DFNet outperforms most 3-D object detection methods, achieving the 3-D detection performance of 77.88% mAP, which indicates that our method is effective in dealing with feature degradation.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 9","pages":"16223-16234"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10934719/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
LiDAR-camera fusion is widely used in 3-D perception tasks. In LiDAR and camera sensing tasks, the hierarchical feature abstraction capability possessed by the deep network is beneficial to capture the detailed information from point clouds and RGB images. However, it tends to filter some of the information to extract important features, where the problem of feature degradation due to loss of useful information is inevitable. The deterioration of LiDAR-camera fusion due to feature degradation, brought about by this factor, becomes a challenging problem. It reduces object recognition and leads to decreased detection accuracy. To address this problem, we propose a dual LiDAR-camera fusion network (DFNet) based on cross-modal compensation and feature enhancement. We design a multimodal feature extraction (MFE) module to complement the sparse features of the point cloud utilizing image features and focusing on the spatial information of the features. Then, we introduce a multiscale feature aggregation (MFA) module to generate bird’s-eye view (BEV) representations of the features, which generates feature proposals that are then input to the voxel-grid aggregation (VGA) module to obtain the grid-pooled features. Meanwhile, the VGA module receives the feature proposals extracted from the image backbone and projects the point cloud through voxels to obtain voxel-fused features. Finally, we aggregate the grid-pooled features and voxel-fused features to produce more informative fused features. The results on the KITTI dataset illustrate that DFNet outperforms most 3-D object detection methods, achieving the 3-D detection performance of 77.88% mAP, which indicates that our method is effective in dealing with feature degradation.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice