特征退化条件下双激光雷达-相机融合的三维目标检测网络

IF 4.3 2区 综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Tao Ye;Ruohan Liu;Chengzu Min;Yuliang Li;Xiaosong Li
{"title":"特征退化条件下双激光雷达-相机融合的三维目标检测网络","authors":"Tao Ye;Ruohan Liu;Chengzu Min;Yuliang Li;Xiaosong Li","doi":"10.1109/JSEN.2025.3551149","DOIUrl":null,"url":null,"abstract":"LiDAR-camera fusion is widely used in 3-D perception tasks. In LiDAR and camera sensing tasks, the hierarchical feature abstraction capability possessed by the deep network is beneficial to capture the detailed information from point clouds and RGB images. However, it tends to filter some of the information to extract important features, where the problem of feature degradation due to loss of useful information is inevitable. The deterioration of LiDAR-camera fusion due to feature degradation, brought about by this factor, becomes a challenging problem. It reduces object recognition and leads to decreased detection accuracy. To address this problem, we propose a dual LiDAR-camera fusion network (DFNet) based on cross-modal compensation and feature enhancement. We design a multimodal feature extraction (MFE) module to complement the sparse features of the point cloud utilizing image features and focusing on the spatial information of the features. Then, we introduce a multiscale feature aggregation (MFA) module to generate bird’s-eye view (BEV) representations of the features, which generates feature proposals that are then input to the voxel-grid aggregation (VGA) module to obtain the grid-pooled features. Meanwhile, the VGA module receives the feature proposals extracted from the image backbone and projects the point cloud through voxels to obtain voxel-fused features. Finally, we aggregate the grid-pooled features and voxel-fused features to produce more informative fused features. The results on the KITTI dataset illustrate that DFNet outperforms most 3-D object detection methods, achieving the 3-D detection performance of 77.88% mAP, which indicates that our method is effective in dealing with feature degradation.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 9","pages":"16223-16234"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DFNet: A Dual LiDAR–Camera Fusion 3-D Object Detection Network Under Feature Degradation Condition\",\"authors\":\"Tao Ye;Ruohan Liu;Chengzu Min;Yuliang Li;Xiaosong Li\",\"doi\":\"10.1109/JSEN.2025.3551149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"LiDAR-camera fusion is widely used in 3-D perception tasks. In LiDAR and camera sensing tasks, the hierarchical feature abstraction capability possessed by the deep network is beneficial to capture the detailed information from point clouds and RGB images. However, it tends to filter some of the information to extract important features, where the problem of feature degradation due to loss of useful information is inevitable. The deterioration of LiDAR-camera fusion due to feature degradation, brought about by this factor, becomes a challenging problem. It reduces object recognition and leads to decreased detection accuracy. To address this problem, we propose a dual LiDAR-camera fusion network (DFNet) based on cross-modal compensation and feature enhancement. We design a multimodal feature extraction (MFE) module to complement the sparse features of the point cloud utilizing image features and focusing on the spatial information of the features. Then, we introduce a multiscale feature aggregation (MFA) module to generate bird’s-eye view (BEV) representations of the features, which generates feature proposals that are then input to the voxel-grid aggregation (VGA) module to obtain the grid-pooled features. Meanwhile, the VGA module receives the feature proposals extracted from the image backbone and projects the point cloud through voxels to obtain voxel-fused features. Finally, we aggregate the grid-pooled features and voxel-fused features to produce more informative fused features. The results on the KITTI dataset illustrate that DFNet outperforms most 3-D object detection methods, achieving the 3-D detection performance of 77.88% mAP, which indicates that our method is effective in dealing with feature degradation.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 9\",\"pages\":\"16223-16234\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10934719/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10934719/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

激光雷达-相机融合广泛应用于三维感知任务。在激光雷达和相机传感任务中,深度网络所具有的分层特征提取能力有利于从点云和RGB图像中获取详细信息。然而,它倾向于过滤一些信息来提取重要的特征,这不可避免地会导致有用信息的丢失而导致特征退化的问题。这一因素带来的特征退化导致的激光雷达-相机融合性能的恶化成为一个具有挑战性的问题。它降低了目标识别,导致检测精度下降。为了解决这一问题,我们提出了一种基于跨模态补偿和特征增强的双激光雷达-相机融合网络(DFNet)。我们设计了一个多模态特征提取(MFE)模块,利用图像特征和关注特征的空间信息来补充点云的稀疏特征。然后,我们引入了一个多尺度特征聚合(MFA)模块来生成特征的鸟瞰图(BEV)表示,BEV表示生成特征建议,然后将这些特征建议输入到体素-网格聚合(VGA)模块以获得网格池特征。同时,VGA模块接收图像主干提取的特征建议,通过体素对点云进行投影,得到体素融合特征。最后,对网格池特征和体素融合特征进行聚合,得到信息更丰富的融合特征。在KITTI数据集上的结果表明,DFNet优于大多数三维目标检测方法,mAP的三维检测性能达到77.88%,表明我们的方法在处理特征退化方面是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DFNet: A Dual LiDAR–Camera Fusion 3-D Object Detection Network Under Feature Degradation Condition
LiDAR-camera fusion is widely used in 3-D perception tasks. In LiDAR and camera sensing tasks, the hierarchical feature abstraction capability possessed by the deep network is beneficial to capture the detailed information from point clouds and RGB images. However, it tends to filter some of the information to extract important features, where the problem of feature degradation due to loss of useful information is inevitable. The deterioration of LiDAR-camera fusion due to feature degradation, brought about by this factor, becomes a challenging problem. It reduces object recognition and leads to decreased detection accuracy. To address this problem, we propose a dual LiDAR-camera fusion network (DFNet) based on cross-modal compensation and feature enhancement. We design a multimodal feature extraction (MFE) module to complement the sparse features of the point cloud utilizing image features and focusing on the spatial information of the features. Then, we introduce a multiscale feature aggregation (MFA) module to generate bird’s-eye view (BEV) representations of the features, which generates feature proposals that are then input to the voxel-grid aggregation (VGA) module to obtain the grid-pooled features. Meanwhile, the VGA module receives the feature proposals extracted from the image backbone and projects the point cloud through voxels to obtain voxel-fused features. Finally, we aggregate the grid-pooled features and voxel-fused features to produce more informative fused features. The results on the KITTI dataset illustrate that DFNet outperforms most 3-D object detection methods, achieving the 3-D detection performance of 77.88% mAP, which indicates that our method is effective in dealing with feature degradation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Sensors Journal
IEEE Sensors Journal 工程技术-工程:电子与电气
CiteScore
7.70
自引率
14.00%
发文量
2058
审稿时长
5.2 months
期刊介绍: The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信