{"title":"VelocityNet: Motion-Driven Feature Aggregation for 3D Object Detection in Point Cloud Sequences","authors":"David Emmerichs, Peter Pinggera, B. Ommer","doi":"10.1109/ICRA48506.2021.9561644","DOIUrl":null,"url":null,"abstract":"The most successful methods for LiDAR-based 3D object detection use sequences of point clouds in order to exploit the increased data density through temporal aggregation. However, common aggregation methods are rarely able to capture fast-moving objects appropriately. These objects are displaced by large distances between frames and naive approaches are not able to successfully leverage the full amount of information spread across time. Yet, especially in autonomous driving scenarios, fast-moving objects are most crucial to detect as they actively take part in highly dynamic traffic situations. This work presents a novel network architecture called VelocityNet which is explicitly designed to temporally align features according to object motion. Our approach extends traditional 3D convolutions by a motion-driven deformation of the convolution kernels across the temporal dimension. The required motion information can be obtained from various sources, ranging from external computation or complementary sensors to an integrated network branch which is trained jointly with the object detection task. The explicit feature alignment allows the training process to focus on the object detection problem and results in a significant increase in detection performance compared to the popular PointPillars baseline, not only for dynamic but also for static objects. We evaluate our approach on the nuScenes dataset and analyze the main reasons for the observed performance gains.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"318 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561644","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The most successful methods for LiDAR-based 3D object detection use sequences of point clouds in order to exploit the increased data density through temporal aggregation. However, common aggregation methods are rarely able to capture fast-moving objects appropriately. These objects are displaced by large distances between frames and naive approaches are not able to successfully leverage the full amount of information spread across time. Yet, especially in autonomous driving scenarios, fast-moving objects are most crucial to detect as they actively take part in highly dynamic traffic situations. This work presents a novel network architecture called VelocityNet which is explicitly designed to temporally align features according to object motion. Our approach extends traditional 3D convolutions by a motion-driven deformation of the convolution kernels across the temporal dimension. The required motion information can be obtained from various sources, ranging from external computation or complementary sensors to an integrated network branch which is trained jointly with the object detection task. The explicit feature alignment allows the training process to focus on the object detection problem and results in a significant increase in detection performance compared to the popular PointPillars baseline, not only for dynamic but also for static objects. We evaluate our approach on the nuScenes dataset and analyze the main reasons for the observed performance gains.