Xu Cheng;Hao Yu;Kevin Ho Man Cheng;Zitong Yu;Guoying Zhao
{"title":"基于模态感知的可见-红外人员再识别域对齐网络","authors":"Xu Cheng;Hao Yu;Kevin Ho Man Cheng;Zitong Yu;Guoying Zhao","doi":"10.1109/TMM.2024.3521822","DOIUrl":null,"url":null,"abstract":"Visible-infrared person re-identification is a challenging task in video surveillance. Most existing works achieve performance gains by aligning feature distributions or image styles across modalities, whereas the multi-granularity information and domain knowledge are usually neglected. Motivated by these issues, we propose a novel modality-aware domain alignment network (MDANet) for visible-infrared person re-identification (VI-ReID), which utilizes global-local context cues and the generalized domain alignment strategy to solve modal differences and poor generalization. Firstly, modality-aware global-local context attention (MGLCA) is proposed to obtain multi-granularity context features and identity-aware patterns. Secondly, we present a generalized domain alignment learning head (GDALH) to relieve the modality discrepancy and enhance the generalization of MDANet, whose core idea is to enrich feature diversity in the domain alignment procedure. Finally, the entire network model is trained by proposing cross-modality circle, classification, and domain alignment losses in an end-to-end fashion. We conduct comprehensive experiments on two standards and their corrupted VI-ReID datasets to validate the robustness and generalization of our approach. MDANet is obviously superior to the most state-of-the-art methods. Specifically, the proposed method can gain 8.86% and 2.50% in Rank-1 accuracy on SYSU-MM01 (all-search and single-shot mode) and RegDB (infrared to visible mode) datasets, respectively. The source code will be made available soon.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2015-2027"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDANet: Modality-Aware Domain Alignment Network for Visible-Infrared Person Re-Identification\",\"authors\":\"Xu Cheng;Hao Yu;Kevin Ho Man Cheng;Zitong Yu;Guoying Zhao\",\"doi\":\"10.1109/TMM.2024.3521822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visible-infrared person re-identification is a challenging task in video surveillance. Most existing works achieve performance gains by aligning feature distributions or image styles across modalities, whereas the multi-granularity information and domain knowledge are usually neglected. Motivated by these issues, we propose a novel modality-aware domain alignment network (MDANet) for visible-infrared person re-identification (VI-ReID), which utilizes global-local context cues and the generalized domain alignment strategy to solve modal differences and poor generalization. Firstly, modality-aware global-local context attention (MGLCA) is proposed to obtain multi-granularity context features and identity-aware patterns. Secondly, we present a generalized domain alignment learning head (GDALH) to relieve the modality discrepancy and enhance the generalization of MDANet, whose core idea is to enrich feature diversity in the domain alignment procedure. Finally, the entire network model is trained by proposing cross-modality circle, classification, and domain alignment losses in an end-to-end fashion. We conduct comprehensive experiments on two standards and their corrupted VI-ReID datasets to validate the robustness and generalization of our approach. MDANet is obviously superior to the most state-of-the-art methods. Specifically, the proposed method can gain 8.86% and 2.50% in Rank-1 accuracy on SYSU-MM01 (all-search and single-shot mode) and RegDB (infrared to visible mode) datasets, respectively. The source code will be made available soon.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"2015-2027\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10814078/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814078/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
MDANet: Modality-Aware Domain Alignment Network for Visible-Infrared Person Re-Identification
Visible-infrared person re-identification is a challenging task in video surveillance. Most existing works achieve performance gains by aligning feature distributions or image styles across modalities, whereas the multi-granularity information and domain knowledge are usually neglected. Motivated by these issues, we propose a novel modality-aware domain alignment network (MDANet) for visible-infrared person re-identification (VI-ReID), which utilizes global-local context cues and the generalized domain alignment strategy to solve modal differences and poor generalization. Firstly, modality-aware global-local context attention (MGLCA) is proposed to obtain multi-granularity context features and identity-aware patterns. Secondly, we present a generalized domain alignment learning head (GDALH) to relieve the modality discrepancy and enhance the generalization of MDANet, whose core idea is to enrich feature diversity in the domain alignment procedure. Finally, the entire network model is trained by proposing cross-modality circle, classification, and domain alignment losses in an end-to-end fashion. We conduct comprehensive experiments on two standards and their corrupted VI-ReID datasets to validate the robustness and generalization of our approach. MDANet is obviously superior to the most state-of-the-art methods. Specifically, the proposed method can gain 8.86% and 2.50% in Rank-1 accuracy on SYSU-MM01 (all-search and single-shot mode) and RegDB (infrared to visible mode) datasets, respectively. The source code will be made available soon.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.