HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2022-06-01 DOI:10.1109/CVPR52688.2022.00862

Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, K. Lu

{"title":"HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction","authors":"Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, K. Lu","doi":"10.1109/CVPR52688.2022.00862","DOIUrl":null,"url":null,"abstract":"Accurately predicting the future motions of surrounding traffic agents is critical for the safety of autonomous ve-hicles. Recently, vectorized approaches have dominated the motion prediction community due to their capability of capturing complex interactions in traffic scenes. How-ever, existing methods neglect the symmetries of the prob-lem and suffer from the expensive computational cost, facing the challenge of making real-time multi-agent motion prediction without sacrificing the prediction performance. To tackle this challenge, we propose Hierarchical Vector Transformer (HiVT) for fast and accurate multi-agent motion prediction. By decomposing the problem into local con-text extraction and global interaction modeling, our method can effectively and efficiently model a large number of agents in the scene. Meanwhile, we propose a translation-invariant scene representation and rotation-invariant spa-tial learning modules, which extract features robust to the geometric transformations of the scene and enable the model to make accurate predictions for multiple agents in a single forward pass. Experiments show that HiVT achieves the state-of-the-art performance on the Argoverse motion forecasting benchmark with a small model size and can make fast multi-agent motion prediction.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"71","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.00862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 71

Abstract

Accurately predicting the future motions of surrounding traffic agents is critical for the safety of autonomous ve-hicles. Recently, vectorized approaches have dominated the motion prediction community due to their capability of capturing complex interactions in traffic scenes. How-ever, existing methods neglect the symmetries of the prob-lem and suffer from the expensive computational cost, facing the challenge of making real-time multi-agent motion prediction without sacrificing the prediction performance. To tackle this challenge, we propose Hierarchical Vector Transformer (HiVT) for fast and accurate multi-agent motion prediction. By decomposing the problem into local con-text extraction and global interaction modeling, our method can effectively and efficiently model a large number of agents in the scene. Meanwhile, we propose a translation-invariant scene representation and rotation-invariant spa-tial learning modules, which extract features robust to the geometric transformations of the scene and enable the model to make accurate predictions for multiple agents in a single forward pass. Experiments show that HiVT achieves the state-of-the-art performance on the Argoverse motion forecasting benchmark with a small model size and can make fast multi-agent motion prediction.

查看原文本刊更多论文

多智能体运动预测的层次向量变换

准确预测周围交通主体的未来运动对自动驾驶汽车的安全至关重要。近年来，矢量化方法由于能够捕捉交通场景中复杂的相互作用，在运动预测领域占据主导地位。然而，现有的方法忽略了问题的对称性，并且计算成本昂贵，面临着在不牺牲预测性能的情况下进行实时多智能体运动预测的挑战。为了解决这一挑战，我们提出了层次向量变压器(HiVT)来快速准确地预测多智能体运动。通过将问题分解为局部上下文提取和全局交互建模，我们的方法可以有效地对场景中的大量智能体进行建模。同时，我们提出了平移不变的场景表示和旋转不变的空间学习模块，提取了对场景几何变换鲁棒的特征，使模型能够在一次前向传递中对多个智能体做出准确的预测。实验表明，HiVT在Argoverse运动预测基准上以较小的模型尺寸达到了最先进的性能，可以进行快速的多智能体运动预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量