{"title":"学习环顾四周:视频显著性预测的深度强化学习代理","authors":"Yiran Tao, Yaosi Hu, Zhenzhong Chen","doi":"10.1109/VCIP53242.2021.9675397","DOIUrl":null,"url":null,"abstract":"In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction\",\"authors\":\"Yiran Tao, Yaosi Hu, Zhenzhong Chen\",\"doi\":\"10.1109/VCIP53242.2021.9675397\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.\",\"PeriodicalId\":114062,\"journal\":{\"name\":\"2021 International Conference on Visual Communications and Image Processing (VCIP)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP53242.2021.9675397\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP53242.2021.9675397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction
In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.