TriReID: Towards Multi-Modal Person Re-Identification via Descriptive Fusion Model

Yajing Zhai, Yawen Zeng, Da Cao, Shaofei Lu
{"title":"TriReID: Towards Multi-Modal Person Re-Identification via Descriptive Fusion Model","authors":"Yajing Zhai, Yawen Zeng, Da Cao, Shaofei Lu","doi":"10.1145/3512527.3531397","DOIUrl":null,"url":null,"abstract":"The cross-modal person re-identification (ReID) aims to retrieve one person from one modality to the other single modality, such as text-based and sketch-based ReID tasks. However, for these different modalities of describing a person, combining multiple aspects can obviously make full use of complementary information and improve the identification performance. Therefore, to explore how to comprehensively consider multi-modal information, we advance a novel multi-modal person re-identification task, which utilizes both text and sketch as a descriptive query to retrieve desired images. In fact, the textual description and the visual description are understood together to retrieve the person in the database to be more aligned with real-world scenarios, which is promising but seldom considered. Besides, based on an existing sketch-based ReID dataset, we construct a new dataset, TriReID, to support this challenging task in a semi-automated way. Particularly, we implement an image captioning model under the active learning paradigm to generate sentences suitable for ReID, in which the quality scores of the three levels are customized. Moreover, we propose a novel framework named Descriptive Fusion Model (DFM) to solve the multi-modal ReID issue. Specifically, we first develop a flexible descriptive embedding function to fuse the text and sketch modalities. Further, the fused descriptive semantic feature is jointly optimized under the generative adversarial paradigm to mitigate the cross-modal semantic gap. Extensive experiments on the TriReID dataset demonstrate the effectiveness and rationality of our proposed solution.","PeriodicalId":179895,"journal":{"name":"Proceedings of the 2022 International Conference on Multimedia Retrieval","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3512527.3531397","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The cross-modal person re-identification (ReID) aims to retrieve one person from one modality to the other single modality, such as text-based and sketch-based ReID tasks. However, for these different modalities of describing a person, combining multiple aspects can obviously make full use of complementary information and improve the identification performance. Therefore, to explore how to comprehensively consider multi-modal information, we advance a novel multi-modal person re-identification task, which utilizes both text and sketch as a descriptive query to retrieve desired images. In fact, the textual description and the visual description are understood together to retrieve the person in the database to be more aligned with real-world scenarios, which is promising but seldom considered. Besides, based on an existing sketch-based ReID dataset, we construct a new dataset, TriReID, to support this challenging task in a semi-automated way. Particularly, we implement an image captioning model under the active learning paradigm to generate sentences suitable for ReID, in which the quality scores of the three levels are customized. Moreover, we propose a novel framework named Descriptive Fusion Model (DFM) to solve the multi-modal ReID issue. Specifically, we first develop a flexible descriptive embedding function to fuse the text and sketch modalities. Further, the fused descriptive semantic feature is jointly optimized under the generative adversarial paradigm to mitigate the cross-modal semantic gap. Extensive experiments on the TriReID dataset demonstrate the effectiveness and rationality of our proposed solution.
TriReID:通过描述融合模型实现多模态人的再识别
跨模态人员再识别(ReID)的目的是将一个人从一种模态检索到另一种单一模态,例如基于文本和基于草图的ReID任务。然而,对于这些描述一个人的不同方式,多方面结合显然可以充分利用互补信息,提高识别性能。因此,为了探索如何综合考虑多模态信息,我们提出了一种新的多模态人物再识别任务,该任务利用文本和草图作为描述性查询来检索所需的图像。事实上,文本描述和视觉描述可以一起理解,以便检索数据库中的人,使其更符合现实场景,这很有希望,但很少有人考虑到。此外,基于现有的基于草图的ReID数据集,我们构建了一个新的数据集TriReID,以半自动化的方式支持这一具有挑战性的任务。特别地,我们在主动学习范式下实现了一个图像字幕模型来生成适合ReID的句子,其中三个层次的质量分数是定制的。此外,我们提出了一种新的描述融合模型(DFM)框架来解决多模态ReID问题。具体而言,我们首先开发了一个灵活的描述性嵌入函数来融合文本和草图模式。此外,在生成对抗范式下,对融合的描述性语义特征进行联合优化,以减轻跨模态语义差距。在TriReID数据集上的大量实验证明了我们提出的解决方案的有效性和合理性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信