Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu
{"title":"LRTG3D:基于截断高斯去噪查询的大接收场三维目标检测","authors":"Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu","doi":"10.1016/j.patrec.2025.08.024","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 346-352"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LRTG3D: Large receptive field 3D object detection with truncated Gaussian denoising query\",\"authors\":\"Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu\",\"doi\":\"10.1016/j.patrec.2025.08.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"197 \",\"pages\":\"Pages 346-352\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525003095\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525003095","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
LRTG3D: Large receptive field 3D object detection with truncated Gaussian denoising query
In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.