MA-ST3D: Motion Associated Self-Training for Unsupervised Domain Adaptation on 3D Object Detection

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-10-24 DOI:10.1109/TIP.2024.3482976

Chi Zhang;Wenbo Chen;Wei Wang;Zhaoxiang Zhang

{"title":"MA-ST3D: Motion Associated Self-Training for Unsupervised Domain Adaptation on 3D Object Detection","authors":"Chi Zhang;Wenbo Chen;Wei Wang;Zhaoxiang Zhang","doi":"10.1109/TIP.2024.3482976","DOIUrl":null,"url":null,"abstract":"Recently, unsupervised domain adaptation (UDA) for 3D object detectors has increasingly garnered attention as a method to eliminate the prohibitive costs associated with generating extensive 3D annotations, which are crucial for effective model training. Self-training (ST) has emerged as a simple and effective technique for UDA. The major issue involved in ST-UDA for 3D object detection is refining the imprecise predictions caused by domain shift and generating accurate pseudo labels as supervisory signals. This study presents a novel ST-UDA framework to generate high-quality pseudo labels by associating predictions of 3D point cloud sequences during ego-motion according to spatial and temporal consistency, named motion-associated self-training for 3D object detection (MA-ST3D). MA-ST3D maintains a global-local pathway (GLP) architecture to generate high-quality pseudo-labels by leveraging both intra-frame and inter-frame consistencies along the spatial dimension of the LiDAR’s ego-motion. It also equips two memory modules for both global and local pathways, called global memory and local memory, to suppress the temporal fluctuation of pseudo-labels during self-training iterations. In addition, a motion-aware loss is introduced to impose discriminated regulations on pseudo labels with different motion statuses, which mitigates the harmful spread of false positive pseudo labels. Finally, our method is evaluated on three representative domain adaptation tasks on authoritative 3D benchmark datasets (i.e. Waymo, Kitti, and nuScenes). MA-ST3D achieved SOTA performance on all evaluated UDA settings and even surpassed the weakly supervised DA methods on the Kitti and NuScenes object detection benchmark.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6227-6240"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10735102/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, unsupervised domain adaptation (UDA) for 3D object detectors has increasingly garnered attention as a method to eliminate the prohibitive costs associated with generating extensive 3D annotations, which are crucial for effective model training. Self-training (ST) has emerged as a simple and effective technique for UDA. The major issue involved in ST-UDA for 3D object detection is refining the imprecise predictions caused by domain shift and generating accurate pseudo labels as supervisory signals. This study presents a novel ST-UDA framework to generate high-quality pseudo labels by associating predictions of 3D point cloud sequences during ego-motion according to spatial and temporal consistency, named motion-associated self-training for 3D object detection (MA-ST3D). MA-ST3D maintains a global-local pathway (GLP) architecture to generate high-quality pseudo-labels by leveraging both intra-frame and inter-frame consistencies along the spatial dimension of the LiDAR’s ego-motion. It also equips two memory modules for both global and local pathways, called global memory and local memory, to suppress the temporal fluctuation of pseudo-labels during self-training iterations. In addition, a motion-aware loss is introduced to impose discriminated regulations on pseudo labels with different motion statuses, which mitigates the harmful spread of false positive pseudo labels. Finally, our method is evaluated on three representative domain adaptation tasks on authoritative 3D benchmark datasets (i.e. Waymo, Kitti, and nuScenes). MA-ST3D achieved SOTA performance on all evaluated UDA settings and even surpassed the weakly supervised DA methods on the Kitti and NuScenes object detection benchmark.

查看原文本刊更多论文

MA-ST3D：用于三维物体检测无监督领域自适应的运动关联自我训练

最近，用于三维物体检测器的无监督领域适应（UDA）越来越受到关注，因为这种方法可以消除与生成大量三维注释相关的高昂成本，而注释对于有效的模型训练至关重要。自我训练（ST）已成为一种简单有效的 UDA 技术。用于三维物体检测的 ST-UDA 所涉及的主要问题是完善域偏移导致的不精确预测，并生成准确的伪标签作为监督信号。本研究提出了一种新颖的 ST-UDA 框架，通过在自我运动过程中根据空间和时间一致性关联三维点云序列的预测来生成高质量的伪标签，该框架被命名为运动关联自我训练三维物体检测（MA-ST3D）。MA-ST3D 采用全局-局部路径（GLP）架构，利用激光雷达自我运动空间维度的帧内和帧间一致性生成高质量的伪标签。它还为全局和局部路径配备了两个记忆模块，分别称为全局记忆和局部记忆，以抑制伪标签在自我训练迭代过程中的时间波动。此外，我们还引入了运动感知损耗，对不同运动状态的伪标签进行区分管理，从而减少伪标签假阳性的有害传播。最后，我们的方法在权威 3D 基准数据集（即 Waymo、Kitti 和 nuScenes）上的三个代表性领域适应任务中进行了评估。MA-ST3D 在所有评估的 UDA 设置上都取得了 SOTA 性能，甚至在 Kitti 和 NuScenes 物体检测基准上超过了弱监督 DA 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量