Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, M. Murakawa, Ryosuke Nakamura
{"title":"FBR-CNN: A Feedback Recurrent Network for Video Saliency Detection","authors":"Guanqun Ding, Nevrez Imamoglu, Ali Caglayan, M. Murakawa, Ryosuke Nakamura","doi":"10.1109/mlsp52302.2021.9596383","DOIUrl":null,"url":null,"abstract":"Different from the saliency detection on static images, the context and dynamic information from video sequences play an important role in saliency prediction on dynamic scenes. In this work, we propose a novel feedback recurrent network (FBR-CNN) to simultaneously learn the abundant contextual and dynamic features for video saliency detection. In order to learn the dynamic relationship from video frames, we incorporate the recurrent convolutional layers into the standard feed-forward CNN model. With multiple video frames as inputs, the long-term dependence and contextual relevance over time could be strengthen due to the powerful recurrent units. Unlike the feed-forward only CNN models, we propose to feed back the learned CNN features from high-level feedback recurrent blocks (FBR-block) to low-level layers to further enhance the the contextual representations. Experiments on the public video saliency benchmarks demonstrate that the model with feedback connections and recurrent units can dramatically improve the performance of the baseline feedforward structure. Moreover, although the proposed model has few parameters (~6.5 MB), it achieves comparable performance against the existing video saliency approaches.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Different from the saliency detection on static images, the context and dynamic information from video sequences play an important role in saliency prediction on dynamic scenes. In this work, we propose a novel feedback recurrent network (FBR-CNN) to simultaneously learn the abundant contextual and dynamic features for video saliency detection. In order to learn the dynamic relationship from video frames, we incorporate the recurrent convolutional layers into the standard feed-forward CNN model. With multiple video frames as inputs, the long-term dependence and contextual relevance over time could be strengthen due to the powerful recurrent units. Unlike the feed-forward only CNN models, we propose to feed back the learned CNN features from high-level feedback recurrent blocks (FBR-block) to low-level layers to further enhance the the contextual representations. Experiments on the public video saliency benchmarks demonstrate that the model with feedback connections and recurrent units can dramatically improve the performance of the baseline feedforward structure. Moreover, although the proposed model has few parameters (~6.5 MB), it achieves comparable performance against the existing video saliency approaches.