CRA: Text to Image Retrieval for Architecture Images by Chinese CLIP

Siyuan Wang, Yuyao Yan, Xi Yang, Kaizhu Huang
{"title":"CRA: Text to Image Retrieval for Architecture Images by Chinese CLIP","authors":"Siyuan Wang, Yuyao Yan, Xi Yang, Kaizhu Huang","doi":"10.1109/cmvit57620.2023.00015","DOIUrl":null,"url":null,"abstract":"Text-to-image retrieval is revolutionized since the Contrastive Language-Image Pre-training model was proposed. Most existing methods learn a latent representation of text and then align its embedding with the corresponding image’s embedding from an image encoder. Recently, several Chinese CLIP models have supported a good representation of paired image-text sets. However, adapting the pre-trained retrieval model to a professional domain still remains a challenge, mainly due to the large domain gap between the professional and general text-image sets. In this paper, we introduce a novel contrastive tuning model, named CRA, using Chinese texts to retrieve architecture-related images by fine-tuning the pre-trained Chinese CLIP. Instead of fine-tuning the whole CLIP model, we engage the Locked-image Text tuning (LiT) strategy to adapt the architecture-terminology sets by tuning the text encoder and freezing the pre-trained large-scale image encoder. We further propose a text-image dataset of architectural design. On the text-to-image retrieval task, we improve the metric of R@20 from 44.92% by the original Chinese CLIP model to 74.61% by our CRA model in the test set.","PeriodicalId":191655,"journal":{"name":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 7th International Conference on Machine Vision and Information Technology (CMVIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cmvit57620.2023.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text-to-image retrieval is revolutionized since the Contrastive Language-Image Pre-training model was proposed. Most existing methods learn a latent representation of text and then align its embedding with the corresponding image’s embedding from an image encoder. Recently, several Chinese CLIP models have supported a good representation of paired image-text sets. However, adapting the pre-trained retrieval model to a professional domain still remains a challenge, mainly due to the large domain gap between the professional and general text-image sets. In this paper, we introduce a novel contrastive tuning model, named CRA, using Chinese texts to retrieve architecture-related images by fine-tuning the pre-trained Chinese CLIP. Instead of fine-tuning the whole CLIP model, we engage the Locked-image Text tuning (LiT) strategy to adapt the architecture-terminology sets by tuning the text encoder and freezing the pre-trained large-scale image encoder. We further propose a text-image dataset of architectural design. On the text-to-image retrieval task, we improve the metric of R@20 from 44.92% by the original Chinese CLIP model to 74.61% by our CRA model in the test set.
CRA:基于中文CLIP的建筑图像文本到图像检索
摘要对比语言-图像预训练模型的提出使文本-图像检索发生了革命性的变化。大多数现有方法学习文本的潜在表示,然后将其嵌入与来自图像编码器的相应图像嵌入对齐。最近,几个中文CLIP模型已经很好地支持了成对图像-文本集的表示。然而,将预训练的检索模型适应于专业领域仍然是一个挑战,这主要是由于专业文本图像集与一般文本图像集之间存在较大的领域差距。在本文中,我们引入了一种新的对比调优模型,称为CRA,通过对预训练的中文CLIP进行微调,使用中文文本检索与建筑相关的图像。我们没有对整个CLIP模型进行微调,而是采用锁定图像文本调整(LiT)策略,通过调整文本编码器和冻结预训练的大规模图像编码器来适应架构术语集。我们进一步提出了一个建筑设计的文本-图像数据集。在文本到图像的检索任务上,我们将R@20的度量从原始中文CLIP模型的44.92%提高到我们的CRA模型在测试集中的74.61%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信