视频中的注视

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI:10.1109/ICCV.2017.160

Adrià Recasens, Carl Vondrick, A. Khosla, A. Torralba

{"title":"视频中的注视","authors":"Adrià Recasens, Carl Vondrick, A. Khosla, A. Torralba","doi":"10.1109/ICCV.2017.160","DOIUrl":null,"url":null,"abstract":"Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"42 1","pages":"1444-1452"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":"{\"title\":\"Following Gaze in Video\",\"authors\":\"Adrià Recasens, Carl Vondrick, A. Khosla, A. Torralba\",\"doi\":\"10.1109/ICCV.2017.160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.\",\"PeriodicalId\":6559,\"journal\":{\"name\":\"2017 IEEE International Conference on Computer Vision (ICCV)\",\"volume\":\"42 1\",\"pages\":\"1444-1452\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"65\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Computer Vision (ICCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2017.160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2017.160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 65

摘要

跟随视频中人物的目光是理解人和行为的重要信号。在本文中，我们提出了一种通过预测一个人(在视频中)正在看的位置来跟踪视频中凝视的方法，即使物体在不同的帧中。我们收集了一个新的数据集videaze，我们用它作为训练和评估模型的基准。给定一个帧中有一个人，我们的模型估计每个帧中凝视位置的密度以及这个人在那个特定帧中注视的概率。我们方法的一个关键方面是一个端到端模型，它可以在只使用凝视作为监督的情况下，共同估计视图之间的显著性、凝视姿势和几何关系。可视化表明，模型学习内部自动解决这些中间任务，而不需要额外的监督。实验表明，我们的方法比现有的方法更好地跟踪视频中的凝视，从而能够更丰富地理解视频中的人类活动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Following Gaze in Video

Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量