基于模态感知的可见-红外人员再识别域对齐网络

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI:10.1109/TMM.2024.3521822

Xu Cheng;Hao Yu;Kevin Ho Man Cheng;Zitong Yu;Guoying Zhao

{"title":"基于模态感知的可见-红外人员再识别域对齐网络","authors":"Xu Cheng;Hao Yu;Kevin Ho Man Cheng;Zitong Yu;Guoying Zhao","doi":"10.1109/TMM.2024.3521822","DOIUrl":null,"url":null,"abstract":"Visible-infrared person re-identification is a challenging task in video surveillance. Most existing works achieve performance gains by aligning feature distributions or image styles across modalities, whereas the multi-granularity information and domain knowledge are usually neglected. Motivated by these issues, we propose a novel modality-aware domain alignment network (MDANet) for visible-infrared person re-identification (VI-ReID), which utilizes global-local context cues and the generalized domain alignment strategy to solve modal differences and poor generalization. Firstly, modality-aware global-local context attention (MGLCA) is proposed to obtain multi-granularity context features and identity-aware patterns. Secondly, we present a generalized domain alignment learning head (GDALH) to relieve the modality discrepancy and enhance the generalization of MDANet, whose core idea is to enrich feature diversity in the domain alignment procedure. Finally, the entire network model is trained by proposing cross-modality circle, classification, and domain alignment losses in an end-to-end fashion. We conduct comprehensive experiments on two standards and their corrupted VI-ReID datasets to validate the robustness and generalization of our approach. MDANet is obviously superior to the most state-of-the-art methods. Specifically, the proposed method can gain 8.86% and 2.50% in Rank-1 accuracy on SYSU-MM01 (all-search and single-shot mode) and RegDB (infrared to visible mode) datasets, respectively. The source code will be made available soon.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2015-2027"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDANet: Modality-Aware Domain Alignment Network for Visible-Infrared Person Re-Identification\",\"authors\":\"Xu Cheng;Hao Yu;Kevin Ho Man Cheng;Zitong Yu;Guoying Zhao\",\"doi\":\"10.1109/TMM.2024.3521822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visible-infrared person re-identification is a challenging task in video surveillance. Most existing works achieve performance gains by aligning feature distributions or image styles across modalities, whereas the multi-granularity information and domain knowledge are usually neglected. Motivated by these issues, we propose a novel modality-aware domain alignment network (MDANet) for visible-infrared person re-identification (VI-ReID), which utilizes global-local context cues and the generalized domain alignment strategy to solve modal differences and poor generalization. Firstly, modality-aware global-local context attention (MGLCA) is proposed to obtain multi-granularity context features and identity-aware patterns. Secondly, we present a generalized domain alignment learning head (GDALH) to relieve the modality discrepancy and enhance the generalization of MDANet, whose core idea is to enrich feature diversity in the domain alignment procedure. Finally, the entire network model is trained by proposing cross-modality circle, classification, and domain alignment losses in an end-to-end fashion. We conduct comprehensive experiments on two standards and their corrupted VI-ReID datasets to validate the robustness and generalization of our approach. MDANet is obviously superior to the most state-of-the-art methods. Specifically, the proposed method can gain 8.86% and 2.50% in Rank-1 accuracy on SYSU-MM01 (all-search and single-shot mode) and RegDB (infrared to visible mode) datasets, respectively. The source code will be made available soon.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"2015-2027\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10814078/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814078/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

可见红外人员再识别是视频监控中的一项具有挑战性的任务。大多数现有的工作通过跨模态对齐特征分布或图像样式来实现性能提升，而多粒度信息和领域知识通常被忽略。基于这些问题，我们提出了一种新的模态感知域对齐网络（MDANet），用于可见红外人再识别（VI-ReID），该网络利用全局-局部上下文线索和广义域对齐策略来解决模态差异和泛化差的问题。首先，提出了模态感知的全局-局部上下文注意（MGLCA）方法，获取多粒度上下文特征和身份感知模式；其次，我们提出了一种广义域对齐学习头（GDALH）来缓解模态差异，增强MDANet的泛化能力，其核心思想是丰富域对齐过程中的特征多样性。最后，通过提出端到端的跨模态圈、分类和域对齐损失来训练整个网络模型。我们在两个标准及其损坏的VI-ReID数据集上进行了全面的实验，以验证我们方法的鲁棒性和泛化性。MDANet显然优于最先进的方法。具体而言，该方法在SYSU-MM01（全搜索和单镜头模式）和RegDB（红外到可见光模式）数据集上的Rank-1精度分别提高了8.86%和2.50%。源代码将很快提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MDANet: Modality-Aware Domain Alignment Network for Visible-Infrared Person Re-Identification

Visible-infrared person re-identification is a challenging task in video surveillance. Most existing works achieve performance gains by aligning feature distributions or image styles across modalities, whereas the multi-granularity information and domain knowledge are usually neglected. Motivated by these issues, we propose a novel modality-aware domain alignment network (MDANet) for visible-infrared person re-identification (VI-ReID), which utilizes global-local context cues and the generalized domain alignment strategy to solve modal differences and poor generalization. Firstly, modality-aware global-local context attention (MGLCA) is proposed to obtain multi-granularity context features and identity-aware patterns. Secondly, we present a generalized domain alignment learning head (GDALH) to relieve the modality discrepancy and enhance the generalization of MDANet, whose core idea is to enrich feature diversity in the domain alignment procedure. Finally, the entire network model is trained by proposing cross-modality circle, classification, and domain alignment losses in an end-to-end fashion. We conduct comprehensive experiments on two standards and their corrupted VI-ReID datasets to validate the robustness and generalization of our approach. MDANet is obviously superior to the most state-of-the-art methods. Specifically, the proposed method can gain 8.86% and 2.50% in Rank-1 accuracy on SYSU-MM01 (all-search and single-shot mode) and RegDB (infrared to visible mode) datasets, respectively. The source code will be made available soon.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.