Knowledge-guided recurrent neural network learning for task-oriented action prediction

2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI:10.1109/ICME.2017.8019345

Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng

{"title":"Knowledge-guided recurrent neural network learning for task-oriented action prediction","authors":"Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, Hui Cheng","doi":"10.1109/ICME.2017.8019345","DOIUrl":null,"url":null,"abstract":"This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent longshort term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2017.8019345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent longshort term memory (LSTM) network for handling this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network usually requires large amounts of annotated samples for covering the semantic space (e.g., diverse action decomposition and ordering). To alleviate this issue, we introduce a temporal And-Or graph (AOG) for task description, which hierarchically represents a task into atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according with common sense) by training another auxiliary LSTM network with a small set of annotated samples. And these generated samples (i.e., task-oriented action sequences) effectively facilitate training the model for task-oriented action prediction. In the experiments, we create a new dataset containing diverse daily tasks and extensively evaluate the effectiveness of our approach.

查看原文本刊更多论文

面向任务的递归神经网络学习

本文研究面向任务的动作预测，即预测在特定场景下完成特定任务的一系列动作，这是计算机视觉研究中的一个新问题。主要的挑战在于如何对特定任务的知识进行建模并将其整合到学习过程中。在这项工作中，我们提出训练一个循环长短期记忆(LSTM)网络来处理这个问题，即以场景图像(包括预先定位的物体)和指定任务作为输入，并循环预测动作序列。然而，训练这样的网络通常需要大量带注释的样本来覆盖语义空间(例如，不同的动作分解和排序)。为了缓解这个问题，我们为任务描述引入了一个临时的And-Or图(AOG)，它分层地将任务表示为原子动作。使用这种AOG表示，我们可以通过使用一小组带注释的样本训练另一个辅助LSTM网络来产生许多有效样本(即符合常识的动作序列)。这些生成的样本(即面向任务的动作序列)有效地促进了面向任务的动作预测模型的训练。在实验中，我们创建了一个包含各种日常任务的新数据集，并广泛评估了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量