Dp-M3D：具有深度感知能力的单目3D目标检测算法

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-04-11 DOI:10.1016/j.knosys.2025.113539

Peicheng Shi , Xinlong Dong , Runshuai Ge , Zhiqiang Liu , Aixi Yang

{"title":"Dp-M3D：具有深度感知能力的单目3D目标检测算法","authors":"Peicheng Shi , Xinlong Dong , Runshuai Ge , Zhiqiang Liu , Aixi Yang","doi":"10.1016/j.knosys.2025.113539","DOIUrl":null,"url":null,"abstract":"<div><div>Considering the limitations of monocular 3D object detection in depth information and perception ability, we introduce a novel monocular 3D object detection algorithm, Dp-M3D, equipped with depth perception capabilities. To effectively model long-range feature dependencies during the fusion of depth maps and image features, we introduce a Transformer Feature Fusion Encoder (TFFEn). TFFEn integrates depth and image features, enabling more comprehensive long-range feature modeling. This enhances depth perception, ultimately improving the accuracy of 3D object detection. To enhance the detection ability of truncated objects at the edges of an image, we propose a Feature Enhancement method based on Deformable Convolution (FEDC). FEDC leverages depth confidence guidance to determine the deformation offset of the 3D bounding box, aligning features more effectively and improving depth perception. Furthermore, to address the issue of anchor box ranking, where candidate boxes with accurate depth predictions but low classification confidence are suppressed, we propose a Depth-perception Non-Maximum Suppression (Dp-NMS) algorithm. Dp-NMS refines the selection process by incorporating the product of classification confidence and depth confidence, ensuring that candidate boxes are ranked effectively and the most suitable detection box is retained. Experimental results on the challenging KITTI 3D object detection dataset demonstrate that the proposed method achieves <span><math><mrow><mi>A</mi><msub><mi>P</mi><mrow><mn>3</mn><mi>D</mi></mrow></msub></mrow></math></span> scores of 23.41 %, 13.65 %, and 12.91 % in the easy, moderate, and hard categories, respectively. Our approach outperforms state-of-the-art monocular 3D object detection algorithms based on image and image-depth map fusion, with particularly significant improvements in detecting edge-truncated objects.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"318 ","pages":"Article 113539"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dp-M3D: Monocular 3D object detection algorithm with depth perception capability\",\"authors\":\"Peicheng Shi , Xinlong Dong , Runshuai Ge , Zhiqiang Liu , Aixi Yang\",\"doi\":\"10.1016/j.knosys.2025.113539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Considering the limitations of monocular 3D object detection in depth information and perception ability, we introduce a novel monocular 3D object detection algorithm, Dp-M3D, equipped with depth perception capabilities. To effectively model long-range feature dependencies during the fusion of depth maps and image features, we introduce a Transformer Feature Fusion Encoder (TFFEn). TFFEn integrates depth and image features, enabling more comprehensive long-range feature modeling. This enhances depth perception, ultimately improving the accuracy of 3D object detection. To enhance the detection ability of truncated objects at the edges of an image, we propose a Feature Enhancement method based on Deformable Convolution (FEDC). FEDC leverages depth confidence guidance to determine the deformation offset of the 3D bounding box, aligning features more effectively and improving depth perception. Furthermore, to address the issue of anchor box ranking, where candidate boxes with accurate depth predictions but low classification confidence are suppressed, we propose a Depth-perception Non-Maximum Suppression (Dp-NMS) algorithm. Dp-NMS refines the selection process by incorporating the product of classification confidence and depth confidence, ensuring that candidate boxes are ranked effectively and the most suitable detection box is retained. Experimental results on the challenging KITTI 3D object detection dataset demonstrate that the proposed method achieves <span><math><mrow><mi>A</mi><msub><mi>P</mi><mrow><mn>3</mn><mi>D</mi></mrow></msub></mrow></math></span> scores of 23.41 %, 13.65 %, and 12.91 % in the easy, moderate, and hard categories, respectively. Our approach outperforms state-of-the-art monocular 3D object detection algorithms based on image and image-depth map fusion, with particularly significant improvements in detecting edge-truncated objects.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"318 \",\"pages\":\"Article 113539\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125005854\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125005854","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

针对单眼三维目标检测在深度信息和感知能力方面的局限性，提出了一种具有深度感知能力的单眼三维目标检测算法Dp-M3D。为了在深度图和图像特征融合过程中有效地建模远程特征依赖关系，我们引入了一个变压器特征融合编码器（Transformer feature fusion Encoder, TFFEn）。TFFEn集成了深度和图像特征，实现了更全面的远程特征建模。这增强了深度感知，最终提高了3D物体检测的准确性。为了增强图像边缘截短目标的检测能力，提出了一种基于可变形卷积（FEDC）的特征增强方法。FEDC利用深度置信度指导来确定3D边界框的变形偏移量，从而更有效地对齐特征并改善深度感知。此外，为了解决锚盒排序问题，我们提出了一种深度感知非最大抑制（deep -perception Non-Maximum Suppression, Dp-NMS）算法，其中具有准确深度预测但分类置信度较低的候选框被抑制。Dp-NMS通过融合分类置信度和深度置信度的乘积来细化选择过程，确保候选框被有效排序，并保留最合适的检测框。在具有挑战性的KITTI三维目标检测数据集上的实验结果表明，该方法在简单、中等和困难类别上的AP3D得分分别为23.41%、13.65%和12.91%。我们的方法优于基于图像和图像深度地图融合的最先进的单目3D物体检测算法，在检测边缘截断的物体方面有特别显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dp-M3D: Monocular 3D object detection algorithm with depth perception capability

Considering the limitations of monocular 3D object detection in depth information and perception ability, we introduce a novel monocular 3D object detection algorithm, Dp-M3D, equipped with depth perception capabilities. To effectively model long-range feature dependencies during the fusion of depth maps and image features, we introduce a Transformer Feature Fusion Encoder (TFFEn). TFFEn integrates depth and image features, enabling more comprehensive long-range feature modeling. This enhances depth perception, ultimately improving the accuracy of 3D object detection. To enhance the detection ability of truncated objects at the edges of an image, we propose a Feature Enhancement method based on Deformable Convolution (FEDC). FEDC leverages depth confidence guidance to determine the deformation offset of the 3D bounding box, aligning features more effectively and improving depth perception. Furthermore, to address the issue of anchor box ranking, where candidate boxes with accurate depth predictions but low classification confidence are suppressed, we propose a Depth-perception Non-Maximum Suppression (Dp-NMS) algorithm. Dp-NMS refines the selection process by incorporating the product of classification confidence and depth confidence, ensuring that candidate boxes are ranked effectively and the most suitable detection box is retained. Experimental results on the challenging KITTI 3D object detection dataset demonstrate that the proposed method achieves

A P_{3 D}

scores of 23.41 %, 13.65 %, and 12.91 % in the easy, moderate, and hard categories, respectively. Our approach outperforms state-of-the-art monocular 3D object detection algorithms based on image and image-depth map fusion, with particularly significant improvements in detecting edge-truncated objects.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.