{"title":"基于视频的人物再识别的长-短时间信息融合体系结构","authors":"Xingzhe Sun, Shanna Zhuang, Zhengyou Wang","doi":"10.1109/ICCEAI52939.2021.00027","DOIUrl":null,"url":null,"abstract":"Person re-identification is a major application of computer vision in reality. Since the data obtained by monitoring in real life is often in video format, and the walking poses of pedestrians are different, in addition to the appearance of pedestrians, how to obtain the motion features of pedestrians, is extremely important for video-based person re-identification. Therefore, for the temporal information of the video, we propose a Long-short Temporal Information Fusion (LSTIF) network. We aggregate temporal information from two perspectives, short-term features containing detailed information and long-term features containing global information. Simultaneously, in order to reduce the amount of calculation, this network also uses non-local blocks, and extend the outpu feature map to the same size as the input, which is convenient for calculation. This paper verifies the effectiveness of our method on two commonly used datasets iLIDS-VID and DukeMTMC-VideoReID.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LSTIF:Long-short Temporal Information Fusion Architecture for Video-based Person Re-identification\",\"authors\":\"Xingzhe Sun, Shanna Zhuang, Zhengyou Wang\",\"doi\":\"10.1109/ICCEAI52939.2021.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Person re-identification is a major application of computer vision in reality. Since the data obtained by monitoring in real life is often in video format, and the walking poses of pedestrians are different, in addition to the appearance of pedestrians, how to obtain the motion features of pedestrians, is extremely important for video-based person re-identification. Therefore, for the temporal information of the video, we propose a Long-short Temporal Information Fusion (LSTIF) network. We aggregate temporal information from two perspectives, short-term features containing detailed information and long-term features containing global information. Simultaneously, in order to reduce the amount of calculation, this network also uses non-local blocks, and extend the outpu feature map to the same size as the input, which is convenient for calculation. This paper verifies the effectiveness of our method on two commonly used datasets iLIDS-VID and DukeMTMC-VideoReID.\",\"PeriodicalId\":331409,\"journal\":{\"name\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI52939.2021.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LSTIF:Long-short Temporal Information Fusion Architecture for Video-based Person Re-identification
Person re-identification is a major application of computer vision in reality. Since the data obtained by monitoring in real life is often in video format, and the walking poses of pedestrians are different, in addition to the appearance of pedestrians, how to obtain the motion features of pedestrians, is extremely important for video-based person re-identification. Therefore, for the temporal information of the video, we propose a Long-short Temporal Information Fusion (LSTIF) network. We aggregate temporal information from two perspectives, short-term features containing detailed information and long-term features containing global information. Simultaneously, in order to reduce the amount of calculation, this network also uses non-local blocks, and extend the outpu feature map to the same size as the input, which is convenient for calculation. This paper verifies the effectiveness of our method on two commonly used datasets iLIDS-VID and DukeMTMC-VideoReID.