Unified pre-training with pseudo infrared images for visible-infrared person re-identification

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications Pub Date : 2024-09-19 DOI:10.1007/s11042-024-20217-8

ZhiGang Liu, Yan Hu

{"title":"Unified pre-training with pseudo infrared images for visible-infrared person re-identification","authors":"ZhiGang Liu, Yan Hu","doi":"10.1007/s11042-024-20217-8","DOIUrl":null,"url":null,"abstract":"In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF\\(^2\\)) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C\\(^2\\)). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"9 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20217-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF\(^2\)) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C\(^2\)). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.

Abstract Image

查看原文本刊更多论文

利用伪红外图像进行统一预训练，实现可见光-红外人员再识别

在可见光-红外人员再识别（VI-ReID）的预训练任务中，存在两个主要挑战： i) 领域差异。公共预训练模型中使用的 ImageNet 与 VI-ReID 任务中的具体人物数据之间存在明显的领域差距。由于收集跨模态配对样本的难度很大，目前适合预训练的大规模数据集非常稀缺。针对上述问题，我们提出了一种新的统一预训练框架（UPPI）。首先，我们在现有可见光人像数据集的基础上建立了一个大规模可见光-伪红外配对样本库（UnitCP），包含近 17 万对样本。得益于这个样本库，不仅训练样本得到大幅扩充，而且在此基础上进行的预训练也有效弥合了领域差异。同时，为了充分利用样本库的潜力，我们在预训练过程中设计了一种创新的特征融合机制（CF/(^2\)）。它利用配对图像中的冗余特征引导模型进行跨模态特征融合。此外，在微调过程中，为了使模型适应缺乏配对图像的数据集，我们引入了中心对比度损失（C/(^2\)）。这种损失会引导模型优先考虑具有一致特征的跨模态特征。在两个标准基准（SYSU-MM01 和 RegDB）上的广泛实验结果表明，与最先进的方法相比，所提出的 UPPI 性能更优。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Tools and Applications 工程技术-工程：电子与电气

CiteScore

7.20

自引率

16.70%

发文量

2439

审稿时长

9.2 months

期刊介绍： Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed. Specific areas of interest include: - Multimedia Tools: - Multimedia Applications: - Prototype multimedia systems and platforms