FBR-CNN: A Feedback Recurrent Network for Video Saliency Detection

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP) Pub Date : 2021-10-25 DOI:10.1109/mlsp52302.2021.9596383

Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, M. Murakawa, Ryosuke Nakamura

{"title":"FBR-CNN: A Feedback Recurrent Network for Video Saliency Detection","authors":"Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, M. Murakawa, Ryosuke Nakamura","doi":"10.1109/mlsp52302.2021.9596383","DOIUrl":null,"url":null,"abstract":"Different from the saliency detection on static images, the context and dynamic information from video sequences play an important role in saliency prediction on dynamic scenes. In this work, we propose a novel feedback recurrent network (FBR-CNN) to simultaneously learn the abundant contextual and dynamic features for video saliency detection. In order to learn the dynamic relationship from video frames, we incorporate the recurrent convolutional layers into the standard feed-forward CNN model. With multiple video frames as inputs, the long-term dependence and contextual relevance over time could be strengthen due to the powerful recurrent units. Unlike the feed-forward only CNN models, we propose to feed back the learned CNN features from high-level feedback recurrent blocks (FBR-block) to low-level layers to further enhance the the contextual representations. Experiments on the public video saliency benchmarks demonstrate that the model with feedback connections and recurrent units can dramatically improve the performance of the baseline feedforward structure. Moreover, although the proposed model has few parameters (~6.5 MB), it achieves comparable performance against the existing video saliency approaches.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Different from the saliency detection on static images, the context and dynamic information from video sequences play an important role in saliency prediction on dynamic scenes. In this work, we propose a novel feedback recurrent network (FBR-CNN) to simultaneously learn the abundant contextual and dynamic features for video saliency detection. In order to learn the dynamic relationship from video frames, we incorporate the recurrent convolutional layers into the standard feed-forward CNN model. With multiple video frames as inputs, the long-term dependence and contextual relevance over time could be strengthen due to the powerful recurrent units. Unlike the feed-forward only CNN models, we propose to feed back the learned CNN features from high-level feedback recurrent blocks (FBR-block) to low-level layers to further enhance the the contextual representations. Experiments on the public video saliency benchmarks demonstrate that the model with feedback connections and recurrent units can dramatically improve the performance of the baseline feedforward structure. Moreover, although the proposed model has few parameters (~6.5 MB), it achieves comparable performance against the existing video saliency approaches.

查看原文本刊更多论文

一种用于视频显著性检测的反馈递归网络

与静态图像的显著性检测不同，视频序列的背景信息和动态信息在动态场景的显著性预测中起着重要作用。在这项工作中，我们提出了一种新的反馈递归网络(FBR-CNN)来同时学习丰富的上下文和动态特征，用于视频显著性检测。为了从视频帧中学习动态关系，我们将循环卷积层合并到标准前馈CNN模型中。有了多个视频帧作为输入，随着时间的推移，由于强大的循环单元，长期依赖性和上下文相关性可以得到加强。与仅前馈的CNN模型不同，我们提出将学习到的CNN特征从高层反馈循环块(fbr块)反馈到低层，以进一步增强上下文表示。在公共视频显著性基准上的实验表明，带有反馈连接和循环单元的模型可以显著提高基线前馈结构的性能。此外，尽管所提出的模型参数很少(约6.5 MB)，但与现有的视频显著性方法相比，它的性能相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)

自引率

0.00%

发文量