Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection

2021 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2021-10-01 DOI:10.1109/ICCV48922.2021.00328

Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei Zhang, Zhenguo Li, L. Gool, Sun Yat-sen

{"title":"Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection","authors":"Hanxue Liang, Chenhan Jiang, Dapeng Feng, Xin Chen, Hang Xu, Xiaodan Liang, Wei Zhang, Zhenguo Li, L. Gool, Sun Yat-sen","doi":"10.1109/ICCV48922.2021.00328","DOIUrl":null,"url":null,"abstract":"Current 3D object detection paradigms highly rely on extensive annotation efforts, which makes them not practical in many real-world industrial applications. Inspired by that a human driver can keep accumulating experiences from self-exploring the roads without any tutor’s guidance, we first step forwards to explore a simple yet effective self-supervised learning framework tailored for LiDAR-based 3D object detection. Although the self-supervised pipeline has achieved great success in 2D domain, the characteristic challenges (e.g., complex geometry structure and various 3D object views) encountered in the 3D domain hinder the direct adoption of existing techniques that often contrast the 2D augmented data or cluster single-view features. Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D. First, GCC-3D introduces a Geometric-Aware Contrastive objective to learn spatial-sensitive local structure representation. This objective enforces the spatially close voxels to have high feature similarity. Second, a Pseudo-Instance Clustering harmonization mechanism is proposed to encourage that different views of pseudo-instances should have consistent similarities to clustering prototype centers. This module endows our model semantic discriminative capacity. Extensive experiments demonstrate our GCC-3D achieves significant performance improvement on data-efficient 3D object detection benchmarks (nuScenes and Waymo). Moreover, our GCC-3D framework can achieve state-of-the art performances on all popular 3D object detection benchmarks.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"94 1","pages":"3273-3282"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV48922.2021.00328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

Current 3D object detection paradigms highly rely on extensive annotation efforts, which makes them not practical in many real-world industrial applications. Inspired by that a human driver can keep accumulating experiences from self-exploring the roads without any tutor’s guidance, we first step forwards to explore a simple yet effective self-supervised learning framework tailored for LiDAR-based 3D object detection. Although the self-supervised pipeline has achieved great success in 2D domain, the characteristic challenges (e.g., complex geometry structure and various 3D object views) encountered in the 3D domain hinder the direct adoption of existing techniques that often contrast the 2D augmented data or cluster single-view features. Here we present a novel self-supervised 3D Object detection framework that seamlessly integrates the geometry-aware contrast and clustering harmonization to lift the unsupervised 3D representation learning, named GCC-3D. First, GCC-3D introduces a Geometric-Aware Contrastive objective to learn spatial-sensitive local structure representation. This objective enforces the spatially close voxels to have high feature similarity. Second, a Pseudo-Instance Clustering harmonization mechanism is proposed to encourage that different views of pseudo-instances should have consistent similarities to clustering prototype centers. This module endows our model semantic discriminative capacity. Extensive experiments demonstrate our GCC-3D achieves significant performance improvement on data-efficient 3D object detection benchmarks (nuScenes and Waymo). Moreover, our GCC-3D framework can achieve state-of-the art performances on all popular 3D object detection benchmarks.

查看原文本刊更多论文

探索自监督三维目标检测的几何感知对比度和聚类协调

当前的3D目标检测范式高度依赖于大量的注释工作，这使得它们在许多现实世界的工业应用中不实用。受人类驾驶员可以在没有任何导师指导的情况下通过自我探索道路不断积累经验的启发，我们第一步探索了为基于激光雷达的3D物体检测量身定制的简单而有效的自我监督学习框架。尽管自监督管道在2D领域取得了巨大的成功，但在3D领域遇到的特征挑战(例如，复杂的几何结构和各种3D对象视图)阻碍了现有技术的直接采用，这些技术通常与2D增强数据或聚类单视图特征进行对比。本文提出了一种新的自监督3D物体检测框架，该框架无缝集成了几何感知对比度和聚类协调，以提升无监督3D表示学习，称为GCC-3D。首先，GCC-3D引入了一个几何感知对比目标来学习空间敏感的局部结构表示。这一目标使得空间上相近的体素具有较高的特征相似性。其次，提出了一种伪实例聚类协调机制，以鼓励伪实例的不同观点与聚类原型中心具有一致的相似性。该模块赋予了模型语义判别能力。广泛的实验表明，我们的GCC-3D在数据高效的3D物体检测基准(nuScenes和Waymo)上取得了显着的性能改进。此外，我们的GCC-3D框架可以在所有流行的3D物体检测基准上实现最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量