从种子发现到深度重建:基于深度网络的群体显著性预测

Proceedings of the 24th ACM international conference on Multimedia Pub Date : 2016-10-01 DOI:10.1145/2964284.2967185

Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao

{"title":"从种子发现到深度重建:基于深度网络的群体显著性预测","authors":"Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao","doi":"10.1145/2964284.2967185","DOIUrl":null,"url":null,"abstract":"Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eye fixations in crowded scenes are inherently \"distinct\" and \"multi-modal\", which differs from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"164 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks\",\"authors\":\"Yanhao Zhang, Lei Qin, Qingming Huang, Kuiyuan Yang, Jun Zhang, H. Yao\",\"doi\":\"10.1145/2964284.2967185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eye fixations in crowded scenes are inherently \\\"distinct\\\" and \\\"multi-modal\\\", which differs from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.\",\"PeriodicalId\":140670,\"journal\":{\"name\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"volume\":\"164 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 24th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2964284.2967185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2967185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

虽然人群显著性预测最近被认为是视频分析的一项重要任务，但尚未得到全面的探讨。挑战在于，在拥挤的场景中，眼睛的注视本质上是“独特的”和“多模态的”，这与常规场景中的注视不同。为此，现有的显著性预测方案通常依赖于人工设计的特征和浅学习范式，而忽略了拥挤场景的潜在特征。在本文中，我们提出了一个专门用于人群视频的显著性预测模型，该模型具有两个新颖之处:1)考虑到人群显著性的感知特性，使用堆叠降噪自编码器(堆叠降噪自编码器)学习的深度表示来发现不同的单元;2)基于对比的显著性通过在除不同单元外的所有单元上训练的第二次SDAE的深度重建误差来测量。为在线处理人群显著性集成了统一的模型。对两个人群视频基准数据集的广泛评估表明，我们的方法可以有效地探索两阶段SDAEs中的人群显著性机制，并取得明显优于现有方法的结果，对参数具有鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From Seed Discovery to Deep Reconstruction: Predicting Saliency in Crowd via Deep Networks

Although saliency prediction in crowd has been recently recognized as an essential task for video analysis, it is not comprehensively explored yet. The challenges lie in that eye fixations in crowded scenes are inherently "distinct" and "multi-modal", which differs from those in regular scenes. To this end, the existing saliency prediction schemes typically rely on hand designed features with shallow learning paradigm, which neglect the underlying characteristics of crowded scenes. In this paper, we propose a saliency prediction model dedicated for crowd videos with two novelties: 1) Distinct units are discovered using deep representation learned by a Stacked Denoising Auto-Encoder (SDAE), considering perceptual properties of crowd saliency; 2) Contrast-based saliency is measured through deep reconstruction errors in the second SDAE trained on all units excluding distinct units. A unified model is integrated for online processing crowd saliency. Extensive evaluations on two crowd video benchmark datasets demonstrate that our approach can effectively explore crowd saliency mechanism in two-stage SDAEs and achieve significantly better results than state-of-the-art methods, with robustness to parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 24th ACM international conference on Multimedia

自引率

0.00%

发文量