使用潜在目标学习的行动预期

Debaditya Roy, Basura Fernando
{"title":"使用潜在目标学习的行动预期","authors":"Debaditya Roy, Basura Fernando","doi":"10.1109/WACV51458.2022.00088","DOIUrl":null,"url":null,"abstract":"To get something done, humans perform a sequence of actions dictated by a goal. So, predicting the next action in the sequence becomes easier once we know the goal that guides the entire activity. We present an action anticipation model that uses goal information in an effective manner. Specifically, we use a latent goal representation as a proxy for the \"real goal\" of the sequence and use this goal information when predicting the next action. We design a model to compute the latent goal representation from the observed video and use it to predict the next action. We also exploit two properties of goals to propose new losses for training the model. First, the effect of the next action should be closer to the latent goal than the observed action, termed as \"goal closeness\". Second, the latent goal should remain consistent before and after the execution of the next action which we coined as \"goal consistency\". Using this technique, we obtain state-of-the-art action anticipation performance on scripted datasets 50Salads and Breakfast that have predefined goals in all their videos. We also evaluate the latent goal-based model on EPIC-KITCHENS55 which is an unscripted dataset with multiple goals being pursued simultaneously. Even though this is not an ideal setup for using latent goals, our model is able to predict the next noun better than existing approaches on both seen and unseen kitchens in the test set.1","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Action anticipation using latent goal learning\",\"authors\":\"Debaditya Roy, Basura Fernando\",\"doi\":\"10.1109/WACV51458.2022.00088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To get something done, humans perform a sequence of actions dictated by a goal. So, predicting the next action in the sequence becomes easier once we know the goal that guides the entire activity. We present an action anticipation model that uses goal information in an effective manner. Specifically, we use a latent goal representation as a proxy for the \\\"real goal\\\" of the sequence and use this goal information when predicting the next action. We design a model to compute the latent goal representation from the observed video and use it to predict the next action. We also exploit two properties of goals to propose new losses for training the model. First, the effect of the next action should be closer to the latent goal than the observed action, termed as \\\"goal closeness\\\". Second, the latent goal should remain consistent before and after the execution of the next action which we coined as \\\"goal consistency\\\". Using this technique, we obtain state-of-the-art action anticipation performance on scripted datasets 50Salads and Breakfast that have predefined goals in all their videos. We also evaluate the latent goal-based model on EPIC-KITCHENS55 which is an unscripted dataset with multiple goals being pursued simultaneously. Even though this is not an ideal setup for using latent goals, our model is able to predict the next noun better than existing approaches on both seen and unseen kitchens in the test set.1\",\"PeriodicalId\":297092,\"journal\":{\"name\":\"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WACV51458.2022.00088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV51458.2022.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

为了完成某件事,人们会根据一个目标执行一系列的行动。因此,一旦我们知道指导整个活动的目标,预测序列中的下一个动作就变得容易了。我们提出了一个有效利用目标信息的动作预期模型。具体来说,我们使用潜在目标表示作为序列的“真实目标”的代理,并在预测下一个动作时使用该目标信息。我们设计了一个模型,从观察到的视频中计算潜在目标的表示,并用它来预测下一步的动作。我们还利用目标的两个性质来提出新的损失来训练模型。首先,下一个行动的效果应该比观察到的行动更接近潜在目标,称为“目标接近”。其次,潜在的目标应该在执行下一个行动之前和之后保持一致,我们称之为“目标一致性”。使用这种技术,我们在脚本数据集50沙拉和早餐上获得了最先进的动作预期性能,这些数据集在所有视频中都有预定义的目标。我们还在EPIC-KITCHENS55上评估了基于潜在目标的模型,EPIC-KITCHENS55是一个同时追求多个目标的无脚本数据集。尽管这不是使用潜在目标的理想设置,但我们的模型能够比现有的方法更好地预测下一个名词,无论是在测试集中看到的还是看不见的厨房
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Action anticipation using latent goal learning
To get something done, humans perform a sequence of actions dictated by a goal. So, predicting the next action in the sequence becomes easier once we know the goal that guides the entire activity. We present an action anticipation model that uses goal information in an effective manner. Specifically, we use a latent goal representation as a proxy for the "real goal" of the sequence and use this goal information when predicting the next action. We design a model to compute the latent goal representation from the observed video and use it to predict the next action. We also exploit two properties of goals to propose new losses for training the model. First, the effect of the next action should be closer to the latent goal than the observed action, termed as "goal closeness". Second, the latent goal should remain consistent before and after the execution of the next action which we coined as "goal consistency". Using this technique, we obtain state-of-the-art action anticipation performance on scripted datasets 50Salads and Breakfast that have predefined goals in all their videos. We also evaluate the latent goal-based model on EPIC-KITCHENS55 which is an unscripted dataset with multiple goals being pursued simultaneously. Even though this is not an ideal setup for using latent goals, our model is able to predict the next noun better than existing approaches on both seen and unseen kitchens in the test set.1
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信