VelocityNet: Motion-Driven Feature Aggregation for 3D Object Detection in Point Cloud Sequences

2021 IEEE International Conference on Robotics and Automation (ICRA) Pub Date : 2021-05-30 DOI:10.1109/ICRA48506.2021.9561644

David Emmerichs, Peter Pinggera, B. Ommer

{"title":"VelocityNet: Motion-Driven Feature Aggregation for 3D Object Detection in Point Cloud Sequences","authors":"David Emmerichs, Peter Pinggera, B. Ommer","doi":"10.1109/ICRA48506.2021.9561644","DOIUrl":null,"url":null,"abstract":"The most successful methods for LiDAR-based 3D object detection use sequences of point clouds in order to exploit the increased data density through temporal aggregation. However, common aggregation methods are rarely able to capture fast-moving objects appropriately. These objects are displaced by large distances between frames and naive approaches are not able to successfully leverage the full amount of information spread across time. Yet, especially in autonomous driving scenarios, fast-moving objects are most crucial to detect as they actively take part in highly dynamic traffic situations. This work presents a novel network architecture called VelocityNet which is explicitly designed to temporally align features according to object motion. Our approach extends traditional 3D convolutions by a motion-driven deformation of the convolution kernels across the temporal dimension. The required motion information can be obtained from various sources, ranging from external computation or complementary sensors to an integrated network branch which is trained jointly with the object detection task. The explicit feature alignment allows the training process to focus on the object detection problem and results in a significant increase in detection performance compared to the popular PointPillars baseline, not only for dynamic but also for static objects. We evaluate our approach on the nuScenes dataset and analyze the main reasons for the observed performance gains.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"318 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561644","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The most successful methods for LiDAR-based 3D object detection use sequences of point clouds in order to exploit the increased data density through temporal aggregation. However, common aggregation methods are rarely able to capture fast-moving objects appropriately. These objects are displaced by large distances between frames and naive approaches are not able to successfully leverage the full amount of information spread across time. Yet, especially in autonomous driving scenarios, fast-moving objects are most crucial to detect as they actively take part in highly dynamic traffic situations. This work presents a novel network architecture called VelocityNet which is explicitly designed to temporally align features according to object motion. Our approach extends traditional 3D convolutions by a motion-driven deformation of the convolution kernels across the temporal dimension. The required motion information can be obtained from various sources, ranging from external computation or complementary sensors to an integrated network branch which is trained jointly with the object detection task. The explicit feature alignment allows the training process to focus on the object detection problem and results in a significant increase in detection performance compared to the popular PointPillars baseline, not only for dynamic but also for static objects. We evaluate our approach on the nuScenes dataset and analyze the main reasons for the observed performance gains.

查看原文本刊更多论文

VelocityNet:用于点云序列中3D物体检测的运动驱动特征聚合

基于激光雷达的三维目标检测最成功的方法是利用点云序列，通过时间聚合来利用增加的数据密度。然而，常见的聚合方法很少能够适当地捕获快速移动的对象。这些对象被帧之间的大距离所取代，朴素的方法无法成功地利用随时间传播的全部信息。然而，特别是在自动驾驶场景中，快速移动的物体是最重要的，因为它们积极参与高度动态的交通情况。这项工作提出了一种新的网络架构，称为VelocityNet，它被明确地设计为根据物体运动暂时对齐特征。我们的方法通过运动驱动的卷积核在时间维度上的变形来扩展传统的3D卷积。所需的运动信息可以从各种来源获得，从外部计算或互补传感器到与目标检测任务联合训练的集成网络分支。明确的特征对齐允许训练过程专注于对象检测问题，并且与流行的PointPillars基线相比，不仅对于动态对象，而且对于静态对象，检测性能都有显著提高。我们在nuScenes数据集上评估了我们的方法，并分析了观察到的性能提升的主要原因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量