使用候选选择从人类凝视中学习视频显著性

2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI:10.1109/CVPR.2013.152

Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor

{"title":"使用候选选择从人类凝视中学习视频显著性","authors":"Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor","doi":"10.1109/CVPR.2013.152","DOIUrl":null,"url":null,"abstract":"During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel method for video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previous frame. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"15 1","pages":"1147-1154"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"152","resultStr":"{\"title\":\"Learning Video Saliency from Human Gaze Using Candidate Selection\",\"authors\":\"Dmitry Rudoy, Dan B. Goldman, Eli Shechtman, Lihi Zelnik-Manor\",\"doi\":\"10.1109/CVPR.2013.152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel method for video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previous frame. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.\",\"PeriodicalId\":6343,\"journal\":{\"name\":\"2013 IEEE Conference on Computer Vision and Pattern Recognition\",\"volume\":\"15 1\",\"pages\":\"1147-1154\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"152\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2013.152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2013.152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 152

摘要

近年来，视觉显著性建模取得了显著进展。我们感兴趣的是视频的显著性。由于视频从根本上不同于静止图像，因此人类观察者对它们的看法是不同的。例如，观察每个视频帧的时间是几分之一秒，而静止图像可以悠闲地观看。因此，视频显著性估计方法应该与图像显著性方法有很大的不同。本文从人们观看视频的方式中得到启发，提出了一种新的视频显著性估计方法。我们通过预测给定帧的显著性映射来明确地建模视频的连续性，并以前一帧的映射为条件。此外，通过将显著位置限制在一个精心选择的候选集中，提高了准确性和计算速度。我们使用两个注视跟踪视频数据集验证了我们的方法，并表明我们的性能优于最先进的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Video Saliency from Human Gaze Using Candidate Selection

During recent years remarkable progress has been made in visual saliency modeling. Our interest is in video saliency. Since videos are fundamentally different from still images, they are viewed differently by human observers. For example, the time each video frame is observed is a fraction of a second, while a still image can be viewed leisurely. Therefore, video saliency estimation methods should differ substantially from image saliency methods. In this paper we propose a novel method for video saliency estimation, which is inspired by the way people watch videos. We explicitly model the continuity of the video by predicting the saliency map of a given frame, conditioned on the map from the previous frame. Furthermore, accuracy and computation speed are improved by restricting the salient locations to a carefully selected candidate set. We validate our method using two gaze-tracked video datasets and show we outperform the state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量