通过减少不连续感受野造成的信息损失，提升基于三维点的物体检测能力

IF 8.6 Q1 REMOTE SENSING

International journal of applied earth observation and geoinformation : ITC journal Pub Date : 2024-08-01 DOI:10.1016/j.jag.2024.104049

Ao Liang , Haiyang Hua , Jian Fang , Huaici Zhao , Tianci Liu

{"title":"通过减少不连续感受野造成的信息损失，提升基于三维点的物体检测能力","authors":"Ao Liang , Haiyang Hua , Jian Fang , Huaici Zhao , Tianci Liu","doi":"10.1016/j.jag.2024.104049","DOIUrl":null,"url":null,"abstract":"<div><p>The point-based 3D object detection method is highly advantageous due to its lightweight nature and fast inference speed, making it a valuable asset in engineering fields such as intelligent transportation and autonomous driving. However, current advanced methods solely focus on learning features from the provided point cloud, neglecting the active role of unoccupied space. This results in the problem of discontinuous receptive field (DRF), leading to the loss of semantic and geometric information of the objects. To address this issue, we propose a new end-to-end single-stage point-based model, DRF-SSD, in this paper. DRF-SSD utilizes a PointNet++-style 3D backbone to maintain fast inference capability. Then, point-wise features are projected onto a plane in the Neck structure, and local and global information are aggregated through the designed Hierarchical Encoding–Decoding (HED) and Hybrid Transformer (HT) modules. The former fills in features for unoccupied space through convolutional layers, enhancing local features by interacting with features in occupied space during the learning process. The latter further expands the receptive field using the global learning ability of transformers. The spatial transformation and learning processes in HED and HT only involve key points, and HED is designed to have a special structure that maintains the sparsity of feature maps, preserving the efficiency of the model’s inference. Finally, query features are back-projected onto points for feature enhancement and input into the detection head for prediction. Extensive experiments on the KITTI datasets demonstrate that DRF-SSD achieves superior detection accuracy compared to previous methods, with significant improvements. Specifically, the approach obtains 2.25%, 0.66%, and 0.42% improvement for the metric of 3D Average Precision (<span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>3</mn><mi>D</mi></mrow></msub></mrow></math></span>) under the easy, moderate, and hard settings, respectively. Additionally, the method enables other point-based detectors to achieve substantial gains, demonstrating its effectiveness. Our code will be made available at <span><span>https://github.com/AlanLiangC/DRF-SSD.git</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"132 ","pages":"Article 104049"},"PeriodicalIF":8.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1569843224004035/pdfft?md5=f526ff5b03ebda1116d75eda066ba096&pid=1-s2.0-S1569843224004035-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Boosting 3D point-based object detection by reducing information loss caused by discontinuous receptive fields\",\"authors\":\"Ao Liang , Haiyang Hua , Jian Fang , Huaici Zhao , Tianci Liu\",\"doi\":\"10.1016/j.jag.2024.104049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The point-based 3D object detection method is highly advantageous due to its lightweight nature and fast inference speed, making it a valuable asset in engineering fields such as intelligent transportation and autonomous driving. However, current advanced methods solely focus on learning features from the provided point cloud, neglecting the active role of unoccupied space. This results in the problem of discontinuous receptive field (DRF), leading to the loss of semantic and geometric information of the objects. To address this issue, we propose a new end-to-end single-stage point-based model, DRF-SSD, in this paper. DRF-SSD utilizes a PointNet++-style 3D backbone to maintain fast inference capability. Then, point-wise features are projected onto a plane in the Neck structure, and local and global information are aggregated through the designed Hierarchical Encoding–Decoding (HED) and Hybrid Transformer (HT) modules. The former fills in features for unoccupied space through convolutional layers, enhancing local features by interacting with features in occupied space during the learning process. The latter further expands the receptive field using the global learning ability of transformers. The spatial transformation and learning processes in HED and HT only involve key points, and HED is designed to have a special structure that maintains the sparsity of feature maps, preserving the efficiency of the model’s inference. Finally, query features are back-projected onto points for feature enhancement and input into the detection head for prediction. Extensive experiments on the KITTI datasets demonstrate that DRF-SSD achieves superior detection accuracy compared to previous methods, with significant improvements. Specifically, the approach obtains 2.25%, 0.66%, and 0.42% improvement for the metric of 3D Average Precision (<span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>3</mn><mi>D</mi></mrow></msub></mrow></math></span>) under the easy, moderate, and hard settings, respectively. Additionally, the method enables other point-based detectors to achieve substantial gains, demonstrating its effectiveness. Our code will be made available at <span><span>https://github.com/AlanLiangC/DRF-SSD.git</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"132 \",\"pages\":\"Article 104049\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1569843224004035/pdfft?md5=f526ff5b03ebda1116d75eda066ba096&pid=1-s2.0-S1569843224004035-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843224004035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843224004035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

摘要

基于点的三维物体检测方法具有轻便、推理速度快等优点，是智能交通和自动驾驶等工程领域的宝贵财富。然而，目前的先进方法只注重从提供的点云中学习特征，而忽视了未占用空间的积极作用。这就产生了非连续感受野（DRF）问题，导致物体的语义和几何信息丢失。为了解决这个问题，我们在本文中提出了一种新的端到端单级点模型 DRF-SSD。DRF-SSD 利用 PointNet++ 式三维骨干网来保持快速推理能力。然后，将点特征投射到 Neck 结构中的平面上，并通过设计的分层编码-解码（HED）和混合变换器（HT）模块聚合局部和全局信息。前者通过卷积层填充未占用空间的特征，在学习过程中与占用空间的特征相互作用，从而增强局部特征。后者则利用变换器的全局学习能力，进一步扩大感受野。HED 和 HT 的空间变换和学习过程只涉及关键点，而且 HED 设计了一种特殊的结构来保持特征图的稀疏性，从而保持了模型推理的效率。最后，将查询特征反投影到点上进行特征增强，并输入检测头进行预测。在 KITTI 数据集上进行的大量实验表明，与以前的方法相比，DRF-SSD 实现了更高的检测精度，并有显著提高。具体来说，在简单、中等和困难设置下，该方法的三维平均精度（）指标分别提高了 2.25%、0.66% 和 0.42%。此外，该方法还能使其他基于点的检测器获得大幅提升，证明了它的有效性。我们的代码将发布在.NET上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boosting 3D point-based object detection by reducing information loss caused by discontinuous receptive fields

The point-based 3D object detection method is highly advantageous due to its lightweight nature and fast inference speed, making it a valuable asset in engineering fields such as intelligent transportation and autonomous driving. However, current advanced methods solely focus on learning features from the provided point cloud, neglecting the active role of unoccupied space. This results in the problem of discontinuous receptive field (DRF), leading to the loss of semantic and geometric information of the objects. To address this issue, we propose a new end-to-end single-stage point-based model, DRF-SSD, in this paper. DRF-SSD utilizes a PointNet++-style 3D backbone to maintain fast inference capability. Then, point-wise features are projected onto a plane in the Neck structure, and local and global information are aggregated through the designed Hierarchical Encoding–Decoding (HED) and Hybrid Transformer (HT) modules. The former fills in features for unoccupied space through convolutional layers, enhancing local features by interacting with features in occupied space during the learning process. The latter further expands the receptive field using the global learning ability of transformers. The spatial transformation and learning processes in HED and HT only involve key points, and HED is designed to have a special structure that maintains the sparsity of feature maps, preserving the efficiency of the model’s inference. Finally, query features are back-projected onto points for feature enhancement and input into the detection head for prediction. Extensive experiments on the KITTI datasets demonstrate that DRF-SSD achieves superior detection accuracy compared to previous methods, with significant improvements. Specifically, the approach obtains 2.25%, 0.66%, and 0.42% improvement for the metric of 3D Average Precision ( $A P_{3 D}$ ) under the easy, moderate, and hard settings, respectively. Additionally, the method enables other point-based detectors to achieve substantial gains, demonstrating its effectiveness. Our code will be made available at https://github.com/AlanLiangC/DRF-SSD.git.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences

CiteScore

12.00

自引率

0.00%

发文量

审稿时长

77 days

期刊介绍： The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.