Junting Yang, Z. Yang, Jing Zhou, Yong Zhao, Qifei Dai, Fuchi Li
{"title":"Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification","authors":"Junting Yang, Z. Yang, Jing Zhou, Yong Zhao, Qifei Dai, Fuchi Li","doi":"10.1145/3461353.3461357","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.","PeriodicalId":114871,"journal":{"name":"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461353.3461357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.