学习环顾四周:视频显著性预测的深度强化学习代理

2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI:10.1109/VCIP53242.2021.9675397

Yiran Tao, Yaosi Hu, Zhenzhong Chen

{"title":"学习环顾四周:视频显著性预测的深度强化学习代理","authors":"Yiran Tao, Yaosi Hu, Zhenzhong Chen","doi":"10.1109/VCIP53242.2021.9675397","DOIUrl":null,"url":null,"abstract":"In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction\",\"authors\":\"Yiran Tao, Yaosi Hu, Zhenzhong Chen\",\"doi\":\"10.1109/VCIP53242.2021.9675397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.\",\"PeriodicalId\":114062,\"journal\":{\"name\":\"2021 International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP53242.2021.9675397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP53242.2021.9675397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在视频显著性预测任务中，关键帧的时间上下文信息的利用是一个关键问题。本文提出了一种用于视频显著性预测的深度强化学习智能体，旨在查看相邻帧并自适应生成包含关键帧最相关信息的显著性上下文窗口以进行显著性预测。具体来说，一步一步的动作集决定是否扩大窗口，同时状态集和奖励函数评估当前窗口的有效性。遵循深度q -学习算法来训练代理学习策略以实现其目标。所提出的智能体可视为即插即用，与一般的视频显著性预测模型兼容。在各种数据集上的实验结果表明，我们的方法可以达到更高的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction

In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Visual Communications and Image Processing (VCIP)

自引率

0.00%

发文量