A Holistic Approach for Role Inference and Action Anticipation in Human Teams

ACM Transactions on Intelligent Systems and Technology (TIST) Pub Date : 2022-05-28 DOI:10.1145/3531230

Junyi Dong, Qingze Huo, Silvia Ferrari

{"title":"A Holistic Approach for Role Inference and Action Anticipation in Human Teams","authors":"Junyi Dong, Qingze Huo, Silvia Ferrari","doi":"10.1145/3531230","DOIUrl":null,"url":null,"abstract":"The ability to anticipate human actions is critical to many cyber-physical systems, such as robots and autonomous vehicles. Computer vision and sensing algorithms to date have focused on extracting and predicting visual features that are explicit in the scene, such as color, appearance, actions, positions, and velocities, using video and physical measurements, such as object depth and motion. Human actions, however, are intrinsically influenced and motivated by many implicit factors such as context, human roles and interactions, past experience, and inner goals or intentions. For example, in a sport team, the team strategy, player role, and dynamic circumstances driven by the behavior of the opponents, all influence the actions of each player. This article proposes a holistic framework for incorporating visual features, as well as hidden information, such as social roles, and domain knowledge. The approach, relying on a novel dynamic Markov random field (DMRF) model, infers the instantaneous team strategy and, subsequently, the players’ roles that are temporally evolving throughout the game. The results from the DMRF inference stage are then integrated with instantaneous visual features, such as individual actions and position, in order to perform holistic action anticipation using a multi-layer perceptron (MLP). The approach is demonstrated on the team sport of volleyball, by first training the DMRF and MLP offline with past videos, and, then, by applying them to new volleyball videos online. These results show that the method is able to infer the players’ roles with an average accuracy of 86.99%, and anticipate future actions over a sequence of up to 46 frames with an average accuracy of 80.50%. Additionally, the method predicts the onset and duration of each action achieving a mean relative error of 14.57% and 15.67%, respectively.","PeriodicalId":123526,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology (TIST)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology (TIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3531230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The ability to anticipate human actions is critical to many cyber-physical systems, such as robots and autonomous vehicles. Computer vision and sensing algorithms to date have focused on extracting and predicting visual features that are explicit in the scene, such as color, appearance, actions, positions, and velocities, using video and physical measurements, such as object depth and motion. Human actions, however, are intrinsically influenced and motivated by many implicit factors such as context, human roles and interactions, past experience, and inner goals or intentions. For example, in a sport team, the team strategy, player role, and dynamic circumstances driven by the behavior of the opponents, all influence the actions of each player. This article proposes a holistic framework for incorporating visual features, as well as hidden information, such as social roles, and domain knowledge. The approach, relying on a novel dynamic Markov random field (DMRF) model, infers the instantaneous team strategy and, subsequently, the players’ roles that are temporally evolving throughout the game. The results from the DMRF inference stage are then integrated with instantaneous visual features, such as individual actions and position, in order to perform holistic action anticipation using a multi-layer perceptron (MLP). The approach is demonstrated on the team sport of volleyball, by first training the DMRF and MLP offline with past videos, and, then, by applying them to new volleyball videos online. These results show that the method is able to infer the players’ roles with an average accuracy of 86.99%, and anticipate future actions over a sequence of up to 46 frames with an average accuracy of 80.50%. Additionally, the method predicts the onset and duration of each action achieving a mean relative error of 14.57% and 15.67%, respectively.

查看原文本刊更多论文

人类团队中角色推断和行动预期的整体方法

预测人类行为的能力对许多网络物理系统至关重要，比如机器人和自动驾驶汽车。迄今为止，计算机视觉和传感算法主要集中在提取和预测场景中明确的视觉特征，如颜色、外观、动作、位置和速度，使用视频和物理测量，如物体深度和运动。然而，人类的行为在本质上受到许多隐性因素的影响和激励，例如环境、人类的角色和互动、过去的经验以及内在的目标或意图。例如，在一个运动团队中，团队策略、玩家角色以及由对手行为驱动的动态环境都会影响每个玩家的行动。本文提出了一个整合视觉特征以及隐藏信息(如社会角色和领域知识)的整体框架。该方法依赖于一种新颖的动态马尔可夫随机场(DMRF)模型，推断出瞬时团队策略，随后，推断出在整个游戏中玩家的角色在时间上的演变。然后，DMRF推理阶段的结果与瞬时视觉特征(如个体动作和位置)相结合，以便使用多层感知器(MLP)执行整体动作预测。该方法在排球团队运动中进行了演示，首先使用过去的视频离线训练DMRF和MLP，然后将它们应用于在线的新排球视频。这些结果表明，该方法能够以86.99%的平均准确率推断球员的角色，并在长达46帧的序列中预测未来的动作，平均准确率为80.50%。此外，该方法预测每个动作的开始和持续时间，平均相对误差分别为14.57%和15.67%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Intelligent Systems and Technology (TIST)

自引率

0.00%

发文量