基于加权高效网络的并行关注视频人物再识别

Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence Pub Date : 2021-03-05 DOI:10.1145/3461353.3461357

Junting Yang, Z. Yang, Jing Zhou, Yong Zhao, Qifei Dai, Fuchi Li

{"title":"基于加权高效网络的并行关注视频人物再识别","authors":"Junting Yang, Z. Yang, Jing Zhou, Yong Zhao, Qifei Dai, Fuchi Li","doi":"10.1145/3461353.3461357","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.","PeriodicalId":114871,"journal":{"name":"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification\",\"authors\":\"Junting Yang, Z. Yang, Jing Zhou, Yong Zhao, Qifei Dai, Fuchi Li\",\"doi\":\"10.1145/3461353.3461357\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.\",\"PeriodicalId\":114871,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461353.3461357\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461353.3461357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新的方法来解决传统基于视频的Re-ID方法所不能解决的时空独立性、特征提取浅、计算量大等问题。基于传统网络的特征提取能力不足，会造成后续不良的连锁反应，为此，我们设计了一种并行时空注意网络(PSTA)来融合时空特征。现有方法在提取深度特征后，需要进行堆栈卷积运算来对大的接受域进行建模，因此我们采用非局部运算来直接捕获远程依赖关系。对于非局部方法，我们提出了一种类似注意的相似度(ALS)自适应学习相似矩阵的权重，然后过滤掉冗余的相似度。为了解决非局部方法带来的高复杂度问题并保持精度，我们在非局部结构中使用空间金字塔池(SPP)来降低复杂度并结合多尺度特征。大量的烧蚀分析实验表明了我们方法的有效性，并且在大规模视频数据集上取得了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification

In this paper, we propose a new way to solve the problems of temporal and spatial independence, shallow feature extraction, and large computation which are not solved by traditional video-based Re-ID methods. Insufficient ability to extract features based on traditional networks can cause problems with bad ripple effect later, therefore we design an attention network named Parallel Spatio-Temporal Attention (PSTA) to fuse spatio-temporal features. After extracting deep features, existed methods need stack convolutional operation to model large receptive fields, so we use Non-local operation to capture long-range dependencies directly. For Non-local method, we propose an Attention-Like Similarity (ALS) to learn the weights of similarity matrix adaptively, then filter out redundant similarities. To solve the high complexity brought by Non-local method and maintain accuracy, we perform Spatial Pyramid Pooling (SPP) in Non-local structure to reduce complexity and combine multi-scale features. Extensive experiments with ablation analysis show the effectiveness of our methods, and state-of-the-art results are achieved on large-scale video datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence

自引率

0.00%

发文量