Auxiliary Representation Guided Network for Visible-Infrared Person Re-Identification

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-12-25 DOI:10.1109/TMM.2024.3521773

Mengzan Qi;Sixian Chan;Chen Hang;Guixu Zhang;Tieyong Zeng;Zhi Li

{"title":"Auxiliary Representation Guided Network for Visible-Infrared Person Re-Identification","authors":"Mengzan Qi;Sixian Chan;Chen Hang;Guixu Zhang;Tieyong Zeng;Zhi Li","doi":"10.1109/TMM.2024.3521773","DOIUrl":null,"url":null,"abstract":"Visible-Infrared Person Re-identification aims to retrieve images of specific identities across modalities. To relieve the large cross-modality discrepancy, researchers introduce the auxiliary modality within the image space to assist modality-invariant representation learning. However, the challenge persists in constraining the inherent quality of generated auxiliary images, further leading to a bottleneck in retrieval performance. In this paper, we propose a novel Auxiliary Representation Guided Network (ARGN) to explore the potential of auxiliary representations, which are directly generated within the modality-shared embedding space. In contrast to the original visible and infrared representations, which contain information solely from their respective modalities, these auxiliary representations integrate cross-modality information by fusing both modalities. In our framework, we utilize these auxiliary representations as modality guidance to reduce the cross-modality discrepancy. First, we propose a High-quality Auxiliary Representation Learning (HARL) framework to generate identity-consistent auxiliary representations. The primary objective of our HARL is to ensure that auxiliary representations capture diverse modality information from both modalities while concurrently preserving identity-related discrimination. Second, guided by these auxiliary representations, we design an Auxiliary Representation Guided Constraint (ARGC) to optimize the modality-shared embedding space. By incorporating this constraint, the modality-shared embedding space is optimized to achieve enhanced intra-identity compactness and inter-identity separability, further improving the retrieval performance. In addition, to improve the robustness of our framework against the modality variation, we introduce a Part-based Adaptive Gaussian Module (PAGM) to adaptively extract discriminative information across modalities. Finally, extensive experiments are conducted to demonstrate the superiority of our method over state-of-the-art approaches on three VI-ReID datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"340-355"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814927/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Visible-Infrared Person Re-identification aims to retrieve images of specific identities across modalities. To relieve the large cross-modality discrepancy, researchers introduce the auxiliary modality within the image space to assist modality-invariant representation learning. However, the challenge persists in constraining the inherent quality of generated auxiliary images, further leading to a bottleneck in retrieval performance. In this paper, we propose a novel Auxiliary Representation Guided Network (ARGN) to explore the potential of auxiliary representations, which are directly generated within the modality-shared embedding space. In contrast to the original visible and infrared representations, which contain information solely from their respective modalities, these auxiliary representations integrate cross-modality information by fusing both modalities. In our framework, we utilize these auxiliary representations as modality guidance to reduce the cross-modality discrepancy. First, we propose a High-quality Auxiliary Representation Learning (HARL) framework to generate identity-consistent auxiliary representations. The primary objective of our HARL is to ensure that auxiliary representations capture diverse modality information from both modalities while concurrently preserving identity-related discrimination. Second, guided by these auxiliary representations, we design an Auxiliary Representation Guided Constraint (ARGC) to optimize the modality-shared embedding space. By incorporating this constraint, the modality-shared embedding space is optimized to achieve enhanced intra-identity compactness and inter-identity separability, further improving the retrieval performance. In addition, to improve the robustness of our framework against the modality variation, we introduce a Part-based Adaptive Gaussian Module (PAGM) to adaptively extract discriminative information across modalities. Finally, extensive experiments are conducted to demonstrate the superiority of our method over state-of-the-art approaches on three VI-ReID datasets.

查看原文本刊更多论文

可见-红外人再识别辅助表示引导网络

可见-红外人体再识别旨在跨模态检索特定身份的图像。为了缓解较大的跨模态差异，研究者在图像空间中引入辅助模态来辅助模态不变表征学习。然而，所生成的辅助图像的固有质量仍然受到限制，这进一步导致了检索性能的瓶颈。在本文中，我们提出了一种新的辅助表示引导网络（ARGN）来探索辅助表示的潜力，这些辅助表示直接在模态共享的嵌入空间中生成。原始的可见光和红外表征只包含各自模态的信息，与之相反，这些辅助表征通过融合两种模态来整合跨模态信息。在我们的框架中，我们利用这些辅助表示作为情态指导来减少跨情态差异。首先，我们提出了一个高质量辅助表征学习（HARL）框架来生成身份一致的辅助表征。我们的HARL的主要目标是确保辅助表征从两种模态中捕获不同的模态信息，同时保留与身份相关的歧视。其次，在辅助表示的指导下，我们设计了辅助表示引导约束（ARGC）来优化模态共享嵌入空间。通过结合这一约束，优化模态共享嵌入空间，增强同一性内的紧密性和同一性间的可分离性，进一步提高检索性能。此外，为了提高我们的框架对模态变化的鲁棒性，我们引入了一个基于部分的自适应高斯模块（PAGM）来自适应地提取模态间的判别信息。最后，进行了广泛的实验，以证明我们的方法在三个VI-ReID数据集上优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.