CAMeL: Cross-Modality Adaptive Meta-Learning for Text-Based Person Retrieval

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-04-29 DOI:10.1109/TIFS.2025.3565392

Hang Yu;Jiahao Wen;Zhedong Zheng

{"title":"CAMeL: Cross-Modality Adaptive Meta-Learning for Text-Based Person Retrieval","authors":"Hang Yu;Jiahao Wen;Zhedong Zheng","doi":"10.1109/TIFS.2025.3565392","DOIUrl":null,"url":null,"abstract":"Text-based person retrieval aims to identify specific individuals within an image database using textual descriptions. Due to the high cost of annotation and privacy protection, researchers resort to synthesized data for the paradigm of pretraining and fine-tuning. However, these generated data often exhibit domain biases in both images and textual annotations, which largely compromise the scalability of the pre-trained model. Therefore, we introduce a domain-agnostic pretraining framework based on Cross-modality Adaptive Meta-Learning (CAMeL) to enhance the model generalization capability during pretraining to facilitate the subsequent downstream tasks. In particular, we develop a series of tasks that reflect the diversity and complexity of real-world scenarios, and introduce a dynamic error sample memory unit to memorize the history for errors encountered within multiple tasks. To further ensure multi-task adaptation, we also adopt an adaptive dual-speed update strategy, balancing fast adaptation to new tasks and slow weight updates for historical tasks. Albeit simple, our proposed model not only surpasses existing state-of-the-art methods on real-world benchmarks, including CUHK-PEDES, ICFG-PEDES, and RSTPReid, but also showcases robustness and scalability in handling biased synthetic images and noisy text annotations. Our code is available at <uri>https://github.com/Jahawn-Wen/CAMeL-reID</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4651-4663"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10980229/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Text-based person retrieval aims to identify specific individuals within an image database using textual descriptions. Due to the high cost of annotation and privacy protection, researchers resort to synthesized data for the paradigm of pretraining and fine-tuning. However, these generated data often exhibit domain biases in both images and textual annotations, which largely compromise the scalability of the pre-trained model. Therefore, we introduce a domain-agnostic pretraining framework based on Cross-modality Adaptive Meta-Learning (CAMeL) to enhance the model generalization capability during pretraining to facilitate the subsequent downstream tasks. In particular, we develop a series of tasks that reflect the diversity and complexity of real-world scenarios, and introduce a dynamic error sample memory unit to memorize the history for errors encountered within multiple tasks. To further ensure multi-task adaptation, we also adopt an adaptive dual-speed update strategy, balancing fast adaptation to new tasks and slow weight updates for historical tasks. Albeit simple, our proposed model not only surpasses existing state-of-the-art methods on real-world benchmarks, including CUHK-PEDES, ICFG-PEDES, and RSTPReid, but also showcases robustness and scalability in handling biased synthetic images and noisy text annotations. Our code is available at https://github.com/Jahawn-Wen/CAMeL-reID

查看原文本刊更多论文

骆驼：基于文本的人物检索的跨模态自适应元学习

基于文本的人物检索旨在使用文本描述识别图像数据库中的特定个体。由于标注和隐私保护成本高，研究人员采用合成数据作为预训练和微调的范式。然而，这些生成的数据通常在图像和文本注释中都表现出域偏差，这在很大程度上损害了预训练模型的可扩展性。因此，我们引入了一个基于跨模态自适应元学习（cross -modal Adaptive Meta-Learning, CAMeL）的领域不可知预训练框架，以增强预训练过程中的模型泛化能力，为后续的后续任务提供方便。特别是，我们开发了一系列反映现实世界场景多样性和复杂性的任务，并引入了一个动态错误样本存储单元来记忆多个任务中遇到的错误历史。为了进一步保证多任务的自适应，我们还采用了自适应双速更新策略，平衡了对新任务的快速适应和对历史任务的慢速权重更新。虽然简单，但我们提出的模型不仅在现实世界的基准测试中超越了现有的最先进的方法，包括中大- pedes， ICFG-PEDES和RSTPReid，而且在处理有偏见的合成图像和噪声文本注释方面也展示了鲁棒性和可扩展性。我们的代码可在https://github.com/Jahawn-Wen/CAMeL-reID上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features