{"title":"Video-Based Person Re-Identification using Refined Attention Networks","authors":"Tanzila Rahman, Mrigank Rochan, Yang Wang","doi":"10.1109/AVSS.2019.8909869","DOIUrl":null,"url":null,"abstract":"We consider the problem of video-based person reidentification. The goal is to identify a person from videos captured under different cameras. In this paper, we propose an efficient attention based model for person re-identifying from videos. Our method generates an attention score for each frame based on frame-level features. The attention scores of all frames in a video are used to produce a weighted feature vector for the input video. This video-level feature vector is refined iteratively for re-identifying persons from videos. Unlike most existing deep learning methods that use global or spatial representation, our approach focuses on attention scores. Extensive experiments on three benchmark datasets demonstrate that our method achieves the state-of-the-art performance.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2019.8909869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
We consider the problem of video-based person reidentification. The goal is to identify a person from videos captured under different cameras. In this paper, we propose an efficient attention based model for person re-identifying from videos. Our method generates an attention score for each frame based on frame-level features. The attention scores of all frames in a video are used to produce a weighted feature vector for the input video. This video-level feature vector is refined iteratively for re-identifying persons from videos. Unlike most existing deep learning methods that use global or spatial representation, our approach focuses on attention scores. Extensive experiments on three benchmark datasets demonstrate that our method achieves the state-of-the-art performance.