Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-02-29 DOI:10.1145/3650045

Vinayak Gupta, Srikanta Bedathur

{"title":"Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows","authors":"Vinayak Gupta, Srikanta Bedathur","doi":"10.1145/3650045","DOIUrl":null,"url":null,"abstract":"Human beings always engage in a vast range of activities and tasks that demonstrate their ability to adapt to different scenarios. These activities can range from the simplest daily routines, like walking and sitting, to multi-level complex endeavors such as cooking a four-course meal. Any human activity can be represented as a temporal sequence of actions performed to achieve a certain goal. Unlike the time series datasets extracted from electronics or machines, these action sequences are highly disparate in their nature – the time to finish a sequence of actions can vary between different persons. Therefore, understanding the dynamics of these sequences is essential for many downstream tasks such as activity length prediction, goal prediction, next-action recommendation, etc. Existing neural network-based approaches that learn a continuous-time activity sequence (or CTAS) are limited to the presence of only visual data or are designed specifically for a particular task, i.e., limited to next action or goal prediction. In this paper, we present ProActive, a neural marked temporal point process (MTPP) framework for modeling the continuous-time distribution of actions in an activity sequence while simultaneously addressing three high-impact problems – next action prediction, sequence-goal prediction, and end-to-end sequence generation. Specifically, we utilize a self-attention module with temporal normalizing flows to model the influence and the inter-arrival times between actions in a sequence. Moreover, for time-sensitive prediction, we perform an early detection of sequence goal via a constrained margin-based optimization procedure. This in-turn allows ProActive to predict the sequence goal using a limited number of actions. In addition, we propose a novel addition over the ProActive model, called ProActive++, that can handle variations in the order of actions, i.e., different methods of achieving a given goal. We demonstrate that this variant can learn the order in which the person or actor prefers to do their actions. Extensive experiments on sequences derived from three activity recognition datasets show the significant accuracy boost of our ProActive and ProActive++ over the state-of-the-art in terms of action and goal prediction, and the first-ever application of end-to-end action sequence generation.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"31 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3650045","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Human beings always engage in a vast range of activities and tasks that demonstrate their ability to adapt to different scenarios. These activities can range from the simplest daily routines, like walking and sitting, to multi-level complex endeavors such as cooking a four-course meal. Any human activity can be represented as a temporal sequence of actions performed to achieve a certain goal. Unlike the time series datasets extracted from electronics or machines, these action sequences are highly disparate in their nature – the time to finish a sequence of actions can vary between different persons. Therefore, understanding the dynamics of these sequences is essential for many downstream tasks such as activity length prediction, goal prediction, next-action recommendation, etc. Existing neural network-based approaches that learn a continuous-time activity sequence (or CTAS) are limited to the presence of only visual data or are designed specifically for a particular task, i.e., limited to next action or goal prediction. In this paper, we present ProActive, a neural marked temporal point process (MTPP) framework for modeling the continuous-time distribution of actions in an activity sequence while simultaneously addressing three high-impact problems – next action prediction, sequence-goal prediction, and end-to-end sequence generation. Specifically, we utilize a self-attention module with temporal normalizing flows to model the influence and the inter-arrival times between actions in a sequence. Moreover, for time-sensitive prediction, we perform an early detection of sequence goal via a constrained margin-based optimization procedure. This in-turn allows ProActive to predict the sequence goal using a limited number of actions. In addition, we propose a novel addition over the ProActive model, called ProActive++, that can handle variations in the order of actions, i.e., different methods of achieving a given goal. We demonstrate that this variant can learn the order in which the person or actor prefers to do their actions. Extensive experiments on sequences derived from three activity recognition datasets show the significant accuracy boost of our ProActive and ProActive++ over the state-of-the-art in terms of action and goal prediction, and the first-ever application of end-to-end action sequence generation.

查看原文本刊更多论文

时间与行动的挂毯：利用时点过程流建模人类活动序列

人类总是从事各种各样的活动和任务，这些活动和任务展示了人类适应不同场景的能力。这些活动既包括最简单的日常活动，如行走和坐姿，也包括多层次的复杂活动，如烹饪四道菜。任何人类活动都可以表示为为实现特定目标而执行的一系列动作的时间序列。与从电子设备或机器中提取的时间序列数据集不同，这些动作序列在性质上存在很大差异--不同的人完成一连串动作所需的时间可能各不相同。因此，了解这些序列的动态对于许多下游任务（如活动长度预测、目标预测、下一步行动推荐等）至关重要。现有的基于神经网络的连续时间活动序列（或 CTAS）学习方法仅限于视觉数据，或专门为特定任务设计，即仅限于下一步行动或目标预测。在本文中，我们介绍了 ProActive，这是一种神经标记时间点过程（MTPP）框架，用于对活动序列中的连续时间动作分布进行建模，同时解决下一个动作预测、序列目标预测和端到端序列生成这三个影响较大的问题。具体来说，我们利用带有时间归一化流的自我关注模块，对序列中行动之间的影响和到达时间进行建模。此外，对于时间敏感性预测，我们通过基于边际的约束优化程序，对序列目标进行早期检测。这反过来又允许 ProActive 使用有限数量的动作预测序列目标。此外，我们还在 ProActive 模型的基础上提出了一种名为 ProActive++ 的新功能，可以处理动作顺序的变化，即实现给定目标的不同方法。我们证明，这种变体可以学习个人或行动者偏好的行动顺序。我们对来自三个活动识别数据集的序列进行了广泛的实验，结果表明我们的 ProActive 和 ProActive++ 在动作和目标预测方面的准确率大大超过了最先进的水平，这也是端到端动作序列生成的首次应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.