LRTG3D：基于截断高斯去噪查询的大接收场三维目标检测

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-09-08 DOI:10.1016/j.patrec.2025.08.024

Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu

{"title":"LRTG3D：基于截断高斯去噪查询的大接收场三维目标检测","authors":"Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu","doi":"10.1016/j.patrec.2025.08.024","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 346-352"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LRTG3D: Large receptive field 3D object detection with truncated Gaussian denoising query\",\"authors\":\"Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu\",\"doi\":\"10.1016/j.patrec.2025.08.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"197 \",\"pages\":\"Pages 346-352\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525003095\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525003095","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，3D目标检测已成为自动驾驶系统的关键组成部分，引起了人们的极大兴趣。经典的基于体素的稀疏卷积神经网络（CNN）以其高效、准确的特征提取方法在单模态检测中得到了广泛的应用。然而，随着检测头的日益复杂，骨干网的特征提取能力往往不足，需要改进特征丰富度。因此，增强从主干提取的特征以更好地适应目标检测任务的需要是至关重要的。在本文中，我们提出了一系列对朴素稀疏CNN骨干网的协同增强。我们引入z- preserving downsampling （Z-PD）来扩大感受野，同时保留临界高度信息。我们的主干的核心是新的双焦点感受野（DFRF）块，它集成了我们提出的双尺度空间卷积（DSSC）来平衡大的接受野和精度，混合焦点稀疏卷积（HFSC）来鲁棒捕获前景特征。此外，为了加速收敛，我们在解码器中引入了截断高斯去噪查询（T-GDQ），以更好地与增强的特征对齐。在nuScenes和Waymo数据集上的大量实验验证了所提出方法的有效性。值得注意的是，我们的模型在nuScenes数据集上实现了67.3 mAP和71.9 NDS，显示出优于领先的3D检测方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LRTG3D: Large receptive field 3D object detection with truncated Gaussian denoising query

In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.