A2X: An Agent and Environment Interaction Benchmark for Multimodal Human Trajectory Prediction

Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games Pub Date : 2021-11-10 DOI:10.1145/3487983.3488302

Samuel S. Sohn, Mihee Lee, Seonghyeon Moon, Gang Qiao, Muhammad Usman, Sejong Yoon, V. Pavlovic, Mubbasir Kapadia

{"title":"A2X: An Agent and Environment Interaction Benchmark for Multimodal Human Trajectory Prediction","authors":"Samuel S. Sohn, Mihee Lee, Seonghyeon Moon, Gang Qiao, Muhammad Usman, Sejong Yoon, V. Pavlovic, Mubbasir Kapadia","doi":"10.1145/3487983.3488302","DOIUrl":null,"url":null,"abstract":"In recent years, human trajectory prediction (HTP) has garnered attention in computer vision literature. Although this task has much in common with the longstanding task of crowd simulation, there is little from crowd simulation that has been borrowed, especially in terms of evaluation protocols. The key difference between the two tasks is that HTP is concerned with forecasting multiple steps at a time and capturing the multimodality of real human trajectories. A majority of HTP models are trained on the same few datasets, which feature small, transient interactions between real people and little to no interaction between people and the environment. Unsurprisingly, when tested on crowd egress scenarios, these models produce erroneous trajectories that accelerate too quickly and collide too frequently, but the metrics used in HTP literature cannot convey these particular issues. To address these challenges, we propose (1) the A2X dataset, which has simulated crowd egress and complex navigation scenarios that compensate for the lack of agent-to-environment interaction in existing real datasets, and (2) evaluation metrics that convey model performance with more reliability and nuance. A subset of these metrics are novel multiverse metrics, which are better-suited for multimodal models than existing metrics. The dataset is available at: https://mubbasir.github.io/HTP-benchmark/.","PeriodicalId":170509,"journal":{"name":"Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487983.3488302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In recent years, human trajectory prediction (HTP) has garnered attention in computer vision literature. Although this task has much in common with the longstanding task of crowd simulation, there is little from crowd simulation that has been borrowed, especially in terms of evaluation protocols. The key difference between the two tasks is that HTP is concerned with forecasting multiple steps at a time and capturing the multimodality of real human trajectories. A majority of HTP models are trained on the same few datasets, which feature small, transient interactions between real people and little to no interaction between people and the environment. Unsurprisingly, when tested on crowd egress scenarios, these models produce erroneous trajectories that accelerate too quickly and collide too frequently, but the metrics used in HTP literature cannot convey these particular issues. To address these challenges, we propose (1) the A2X dataset, which has simulated crowd egress and complex navigation scenarios that compensate for the lack of agent-to-environment interaction in existing real datasets, and (2) evaluation metrics that convey model performance with more reliability and nuance. A subset of these metrics are novel multiverse metrics, which are better-suited for multimodal models than existing metrics. The dataset is available at: https://mubbasir.github.io/HTP-benchmark/.

查看原文本刊更多论文

A2X:多模式人类轨迹预测的Agent和环境交互基准

近年来，人类轨迹预测(HTP)在计算机视觉文献中引起了广泛的关注。尽管这一任务与长期存在的人群模拟任务有很多共同之处，但从人群模拟中借鉴的东西很少，特别是在评估协议方面。这两个任务之间的关键区别在于，HTP关注的是一次预测多个步骤，并捕捉真实人类轨迹的多模态。大多数http模型都是在相同的几个数据集上训练的，这些数据集的特点是真人之间的小而短暂的交互，而人与环境之间几乎没有交互。不出所料，当对人群出口场景进行测试时，这些模型产生了错误的轨迹，加速太快，碰撞太频繁，但http文献中使用的指标无法传达这些特定问题。为了应对这些挑战，我们提出(1)A2X数据集，该数据集模拟了人群出口和复杂的导航场景，弥补了现有真实数据集中缺乏代理与环境交互的不足;(2)评估指标，以更高的可靠性和细微差别传达模型性能。这些度量的一个子集是新的多重度量，它比现有的度量更适合于多模态模型。该数据集可从https://mubbasir.github.io/HTP-benchmark/获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games

自引率

0.00%

发文量