基于视频的人物再识别的长-短时间信息融合体系结构

2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI) Pub Date : 2021-08-01 DOI:10.1109/ICCEAI52939.2021.00027

Xingzhe Sun, Shanna Zhuang, Zhengyou Wang

{"title":"基于视频的人物再识别的长-短时间信息融合体系结构","authors":"Xingzhe Sun, Shanna Zhuang, Zhengyou Wang","doi":"10.1109/ICCEAI52939.2021.00027","DOIUrl":null,"url":null,"abstract":"Person re-identification is a major application of computer vision in reality. Since the data obtained by monitoring in real life is often in video format, and the walking poses of pedestrians are different, in addition to the appearance of pedestrians, how to obtain the motion features of pedestrians, is extremely important for video-based person re-identification. Therefore, for the temporal information of the video, we propose a Long-short Temporal Information Fusion (LSTIF) network. We aggregate temporal information from two perspectives, short-term features containing detailed information and long-term features containing global information. Simultaneously, in order to reduce the amount of calculation, this network also uses non-local blocks, and extend the outpu feature map to the same size as the input, which is convenient for calculation. This paper verifies the effectiveness of our method on two commonly used datasets iLIDS-VID and DukeMTMC-VideoReID.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LSTIF:Long-short Temporal Information Fusion Architecture for Video-based Person Re-identification\",\"authors\":\"Xingzhe Sun, Shanna Zhuang, Zhengyou Wang\",\"doi\":\"10.1109/ICCEAI52939.2021.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Person re-identification is a major application of computer vision in reality. Since the data obtained by monitoring in real life is often in video format, and the walking poses of pedestrians are different, in addition to the appearance of pedestrians, how to obtain the motion features of pedestrians, is extremely important for video-based person re-identification. Therefore, for the temporal information of the video, we propose a Long-short Temporal Information Fusion (LSTIF) network. We aggregate temporal information from two perspectives, short-term features containing detailed information and long-term features containing global information. Simultaneously, in order to reduce the amount of calculation, this network also uses non-local blocks, and extend the outpu feature map to the same size as the input, which is convenient for calculation. This paper verifies the effectiveness of our method on two commonly used datasets iLIDS-VID and DukeMTMC-VideoReID.\",\"PeriodicalId\":331409,\"journal\":{\"name\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI52939.2021.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人的再识别是计算机视觉在现实中的一个重要应用。由于现实生活中监控获得的数据往往是视频格式的，而行人的行走姿势又各不相同，所以除了行人的外观外，如何获取行人的运动特征，对于基于视频的人再识别来说是极其重要的。因此，针对视频的时间信息，我们提出了一种长-短时间信息融合(LSTIF)网络。我们从两个角度聚合时间信息，包含详细信息的短期特征和包含全局信息的长期特征。同时，为了减少计算量，该网络还使用了非局部块，并将输出特征映射扩展到与输入相同的大小，方便计算。本文在两个常用数据集iLIDS-VID和DukeMTMC-VideoReID上验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LSTIF:Long-short Temporal Information Fusion Architecture for Video-based Person Re-identification

Person re-identification is a major application of computer vision in reality. Since the data obtained by monitoring in real life is often in video format, and the walking poses of pedestrians are different, in addition to the appearance of pedestrians, how to obtain the motion features of pedestrians, is extremely important for video-based person re-identification. Therefore, for the temporal information of the video, we propose a Long-short Temporal Information Fusion (LSTIF) network. We aggregate temporal information from two perspectives, short-term features containing detailed information and long-term features containing global information. Simultaneously, in order to reduce the amount of calculation, this network also uses non-local blocks, and extend the outpu feature map to the same size as the input, which is convenient for calculation. This paper verifies the effectiveness of our method on two commonly used datasets iLIDS-VID and DukeMTMC-VideoReID.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)

自引率

0.00%

发文量