SAMUNet：利用形状感知Mini-Unet增强自动驾驶中基于柱子的3D物体检测

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-08-22 DOI:10.1016/j.imavis.2025.105703

Liping Zhu, Xuan Li, Bohui Li, Chengyang Li, Bingyao Wang, XianXiang Chang

{"title":"SAMUNet：利用形状感知Mini-Unet增强自动驾驶中基于柱子的3D物体检测","authors":"Liping Zhu, Xuan Li, Bohui Li, Chengyang Li, Bingyao Wang, XianXiang Chang","doi":"10.1016/j.imavis.2025.105703","DOIUrl":null,"url":null,"abstract":"<div><div>Pillar-based 3D object detection methods outperform traditional point-based and voxel-based methods in terms of speed. However, existing methods struggle with accurately detecting large objects in complex environments due to the limitations in capturing global spatial dependencies. To address these issues, this paper proposes Shape-aware Mini-Unet Network (SAMUNet), a simple yet effective hierarchical 3D object detection network. SAMUNet incorporates multiple Sparse Mini-Unet blocks and a Shape-aware Center Head. Concretely, after converting the original point cloud into pillars, we first progressively reduce the spatial distance between distant features through downsampling in the Sparse Mini-Unet block. Then, we recover lost details through multi-scale feature fusion, enhancing the model’s ability to detect various objects. Unlike other methods, the upsampling operation in the Sparse Mini-Unet block only processes the effective feature coverage area of the intermediate feature map, significantly reducing computational costs. Finally, to further improve the accuracy of bounding box regression, we introduce Shape-aware Center Head, which models the geometric information of the bounding box’s offset direction and scale using 3D Shape-aware IoU. Extensive experiments on the nuScenes and Waymo datasets demonstrate that SAMUNet excels in detecting large objects and overall outperforms current state-of-the-art detectors, achieving 72.0% NDS and 67.7% mAP.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105703"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SAMUNet: Enhancing pillar-based 3D object detection in autonomous driving with Shape-aware Mini-Unet\",\"authors\":\"Liping Zhu, Xuan Li, Bohui Li, Chengyang Li, Bingyao Wang, XianXiang Chang\",\"doi\":\"10.1016/j.imavis.2025.105703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Pillar-based 3D object detection methods outperform traditional point-based and voxel-based methods in terms of speed. However, existing methods struggle with accurately detecting large objects in complex environments due to the limitations in capturing global spatial dependencies. To address these issues, this paper proposes Shape-aware Mini-Unet Network (SAMUNet), a simple yet effective hierarchical 3D object detection network. SAMUNet incorporates multiple Sparse Mini-Unet blocks and a Shape-aware Center Head. Concretely, after converting the original point cloud into pillars, we first progressively reduce the spatial distance between distant features through downsampling in the Sparse Mini-Unet block. Then, we recover lost details through multi-scale feature fusion, enhancing the model’s ability to detect various objects. Unlike other methods, the upsampling operation in the Sparse Mini-Unet block only processes the effective feature coverage area of the intermediate feature map, significantly reducing computational costs. Finally, to further improve the accuracy of bounding box regression, we introduce Shape-aware Center Head, which models the geometric information of the bounding box’s offset direction and scale using 3D Shape-aware IoU. Extensive experiments on the nuScenes and Waymo datasets demonstrate that SAMUNet excels in detecting large objects and overall outperforms current state-of-the-art detectors, achieving 72.0% NDS and 67.7% mAP.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105703\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002914\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002914","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于柱的三维目标检测方法在速度方面优于传统的基于点和基于体素的方法。然而，由于捕获全局空间依赖性的限制，现有方法难以准确检测复杂环境中的大型物体。为了解决这些问题，本文提出了一种简单而有效的分层三维目标检测网络——形状感知迷你单元网络（SAMUNet）。SAMUNet结合了多个稀疏的Mini-Unet块和一个形状感知中心头。具体来说，在将原始点云转换成柱子之后，我们首先在Sparse Mini-Unet块中通过降采样逐步减小距离特征之间的空间距离。然后，通过多尺度特征融合恢复丢失的细节，增强模型对各种目标的检测能力。与其他方法不同，稀疏Mini-Unet块中的上采样操作只处理中间特征映射的有效特征覆盖区域，大大降低了计算成本。最后，为了进一步提高边界盒回归的精度，我们引入了形状感知中心头（Shape-aware Center Head），利用三维形状感知IoU对边界盒偏移方向和尺度的几何信息进行建模。在nuScenes和Waymo数据集上进行的大量实验表明，SAMUNet在检测大型物体方面表现出色，总体上优于当前最先进的探测器，达到72.0%的NDS和67.7%的mAP。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SAMUNet: Enhancing pillar-based 3D object detection in autonomous driving with Shape-aware Mini-Unet

Pillar-based 3D object detection methods outperform traditional point-based and voxel-based methods in terms of speed. However, existing methods struggle with accurately detecting large objects in complex environments due to the limitations in capturing global spatial dependencies. To address these issues, this paper proposes Shape-aware Mini-Unet Network (SAMUNet), a simple yet effective hierarchical 3D object detection network. SAMUNet incorporates multiple Sparse Mini-Unet blocks and a Shape-aware Center Head. Concretely, after converting the original point cloud into pillars, we first progressively reduce the spatial distance between distant features through downsampling in the Sparse Mini-Unet block. Then, we recover lost details through multi-scale feature fusion, enhancing the model’s ability to detect various objects. Unlike other methods, the upsampling operation in the Sparse Mini-Unet block only processes the effective feature coverage area of the intermediate feature map, significantly reducing computational costs. Finally, to further improve the accuracy of bounding box regression, we introduce Shape-aware Center Head, which models the geometric information of the bounding box’s offset direction and scale using 3D Shape-aware IoU. Extensive experiments on the nuScenes and Waymo datasets demonstrate that SAMUNet excels in detecting large objects and overall outperforms current state-of-the-art detectors, achieving 72.0% NDS and 67.7% mAP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.