Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI:10.1109/CVPRW59228.2023.00583

H. Le, Quang Qui-Vinh Nguyen, Duc Trung Luu, Truc Thi-Thanh Chau, Nhat Minh Chung, Synh Viet-Uyen Ha

{"title":"Tracked-Vehicle Retrieval by Natural Language Descriptions with Multi-Contextual Adaptive Knowledge","authors":"H. Le, Quang Qui-Vinh Nguyen, Duc Trung Luu, Truc Thi-Thanh Chau, Nhat Minh Chung, Synh Viet-Uyen Ha","doi":"10.1109/CVPRW59228.2023.00583","DOIUrl":null,"url":null,"abstract":"This paper introduces our solution for Track 2 in AI City Challenge 2023. The task is tracked-vehicle retrieval by natural language descriptions with a real-world dataset of various scenarios and cameras. Our solution mainly focuses on four points: (1) To address the linguistic ambiguity in the language query, we leverage our proposed standardized version for text descriptions for the domain-adaptive training and post-processing stage. (2) Our baseline vehicle retrieval model utilizes CLIP to extract robust visual and textual feature representations to learn the unified cross-modal representations between textual and visual features. (3) Our proposed semi-supervised domain adaptive (SSDA) training method is leveraged to address the domain gap between the train and test set. (4) Finally, we propose a multi-contextual post-processing technique that prunes out the wrong results based on multi-contextual attributes information that effectively boosts the final retrieval results. Our proposed framework has yielded a competitive performance of 82.63% MRR accuracy on the test set, achieving 1st place in the competition. Codes will be available at https://github.com/zef1611/AIC23_NLRetrieval_HCMIU_CVIP","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"704 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This paper introduces our solution for Track 2 in AI City Challenge 2023. The task is tracked-vehicle retrieval by natural language descriptions with a real-world dataset of various scenarios and cameras. Our solution mainly focuses on four points: (1) To address the linguistic ambiguity in the language query, we leverage our proposed standardized version for text descriptions for the domain-adaptive training and post-processing stage. (2) Our baseline vehicle retrieval model utilizes CLIP to extract robust visual and textual feature representations to learn the unified cross-modal representations between textual and visual features. (3) Our proposed semi-supervised domain adaptive (SSDA) training method is leveraged to address the domain gap between the train and test set. (4) Finally, we propose a multi-contextual post-processing technique that prunes out the wrong results based on multi-contextual attributes information that effectively boosts the final retrieval results. Our proposed framework has yielded a competitive performance of 82.63% MRR accuracy on the test set, achieving 1st place in the competition. Codes will be available at https://github.com/zef1611/AIC23_NLRetrieval_HCMIU_CVIP

查看原文本刊更多论文

基于多上下文自适应知识的自然语言描述跟踪车辆检索

本文介绍了我们在AI城市挑战赛2023第2赛道上的解决方案。该任务是通过自然语言描述和各种场景和摄像头的真实数据集来检索跟踪车辆。我们的解决方案主要集中在四个方面:(1)为了解决语言查询中的语言歧义，我们利用我们提出的标准化文本描述版本进行领域自适应训练和后处理阶段。(2)我们的基线车辆检索模型利用CLIP提取鲁棒的视觉和文本特征表示，学习文本特征和视觉特征之间的统一跨模态表示。(3)利用我们提出的半监督域自适应(SSDA)训练方法来解决训练集和测试集之间的域差距。(4)最后，我们提出了一种基于多上下文属性信息的多上下文后处理技术，该技术可以有效地剔除错误的检索结果，从而提高最终的检索结果。我们提出的框架在测试集上产生了82.63%的MRR准确率的竞争性能，在比赛中获得了第一名。代码可在https://github.com/zef1611/AIC23_NLRetrieval_HCMIU_CVIP上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量