TriReID: Towards Multi-Modal Person Re-Identification via Descriptive Fusion Model

Proceedings of the 2022 International Conference on Multimedia Retrieval Pub Date : 2022-06-27 DOI:10.1145/3512527.3531397

Yajing Zhai, Yawen Zeng, Da Cao, Shaofei Lu

{"title":"TriReID: Towards Multi-Modal Person Re-Identification via Descriptive Fusion Model","authors":"Yajing Zhai, Yawen Zeng, Da Cao, Shaofei Lu","doi":"10.1145/3512527.3531397","DOIUrl":null,"url":null,"abstract":"The cross-modal person re-identification (ReID) aims to retrieve one person from one modality to the other single modality, such as text-based and sketch-based ReID tasks. However, for these different modalities of describing a person, combining multiple aspects can obviously make full use of complementary information and improve the identification performance. Therefore, to explore how to comprehensively consider multi-modal information, we advance a novel multi-modal person re-identification task, which utilizes both text and sketch as a descriptive query to retrieve desired images. In fact, the textual description and the visual description are understood together to retrieve the person in the database to be more aligned with real-world scenarios, which is promising but seldom considered. Besides, based on an existing sketch-based ReID dataset, we construct a new dataset, TriReID, to support this challenging task in a semi-automated way. Particularly, we implement an image captioning model under the active learning paradigm to generate sentences suitable for ReID, in which the quality scores of the three levels are customized. Moreover, we propose a novel framework named Descriptive Fusion Model (DFM) to solve the multi-modal ReID issue. Specifically, we first develop a flexible descriptive embedding function to fuse the text and sketch modalities. Further, the fused descriptive semantic feature is jointly optimized under the generative adversarial paradigm to mitigate the cross-modal semantic gap. Extensive experiments on the TriReID dataset demonstrate the effectiveness and rationality of our proposed solution.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The cross-modal person re-identification (ReID) aims to retrieve one person from one modality to the other single modality, such as text-based and sketch-based ReID tasks. However, for these different modalities of describing a person, combining multiple aspects can obviously make full use of complementary information and improve the identification performance. Therefore, to explore how to comprehensively consider multi-modal information, we advance a novel multi-modal person re-identification task, which utilizes both text and sketch as a descriptive query to retrieve desired images. In fact, the textual description and the visual description are understood together to retrieve the person in the database to be more aligned with real-world scenarios, which is promising but seldom considered. Besides, based on an existing sketch-based ReID dataset, we construct a new dataset, TriReID, to support this challenging task in a semi-automated way. Particularly, we implement an image captioning model under the active learning paradigm to generate sentences suitable for ReID, in which the quality scores of the three levels are customized. Moreover, we propose a novel framework named Descriptive Fusion Model (DFM) to solve the multi-modal ReID issue. Specifically, we first develop a flexible descriptive embedding function to fuse the text and sketch modalities. Further, the fused descriptive semantic feature is jointly optimized under the generative adversarial paradigm to mitigate the cross-modal semantic gap. Extensive experiments on the TriReID dataset demonstrate the effectiveness and rationality of our proposed solution.

查看原文本刊更多论文

TriReID:通过描述融合模型实现多模态人的再识别

跨模态人员再识别(ReID)的目的是将一个人从一种模态检索到另一种单一模态，例如基于文本和基于草图的ReID任务。然而，对于这些描述一个人的不同方式，多方面结合显然可以充分利用互补信息，提高识别性能。因此，为了探索如何综合考虑多模态信息，我们提出了一种新的多模态人物再识别任务，该任务利用文本和草图作为描述性查询来检索所需的图像。事实上，文本描述和视觉描述可以一起理解，以便检索数据库中的人，使其更符合现实场景，这很有希望，但很少有人考虑到。此外，基于现有的基于草图的ReID数据集，我们构建了一个新的数据集TriReID，以半自动化的方式支持这一具有挑战性的任务。特别地，我们在主动学习范式下实现了一个图像字幕模型来生成适合ReID的句子，其中三个层次的质量分数是定制的。此外，我们提出了一种新的描述融合模型(DFM)框架来解决多模态ReID问题。具体而言，我们首先开发了一个灵活的描述性嵌入函数来融合文本和草图模式。此外，在生成对抗范式下，对融合的描述性语义特征进行联合优化，以减轻跨模态语义差距。在TriReID数据集上的大量实验证明了我们提出的解决方案的有效性和合理性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2022 International Conference on Multimedia Retrieval

自引率

0.00%

发文量