Triplet Bridge for Zero-Shot Sketch-Based Image Retrieval

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-11-26 DOI:10.1109/TETCI.2024.3502430

Jiahao Zheng;Yu Tang;Dapeng Wu

{"title":"Triplet Bridge for Zero-Shot Sketch-Based Image Retrieval","authors":"Jiahao Zheng;Yu Tang;Dapeng Wu","doi":"10.1109/TETCI.2024.3502430","DOIUrl":null,"url":null,"abstract":"Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) has always been a hard nut to crack due to the scarcity of sketch data and the abstract visual information contained in sketches. Previous works focus on designing various network architectures and using the gold standard triplet loss to solve ZS-SBIR, but they have always encountered obstacles in enhancing model generalization and extracting abstract visual information. In contrast, this work proposes a concise and effective Triplet Bridge (TriBri) framework to clear these obstacles fundamentally. Specifically, we use InfoNCE as the core to construct cross-modal representations between images and sketches, which can increase the margin between feature clusters with different categories in the representation space and improve the generalization of the model. Furthermore, we introduce text with abstract properties into the framework to construct a ternary relationship, and the three heterogeneous gaps between text, image, and sketch modalities are connected by InfoNCE. In this process, the common abstract visual cues in both images and sketches can be captured by the feature extractor with the guiding of text abstract information. Ultimately, comprehensive experiments on three commonly used datasets (i.e., TU-Berlin, Sketchy, and QuickDraw) validate that our framework can effectively solve these obstacles in a simple yet powerful manner. Furthermore, compared to state-of-the-art methods, the proposed TriBri exhibits comprehensive performance superiority.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"2014-2025"},"PeriodicalIF":5.3000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10767753/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) has always been a hard nut to crack due to the scarcity of sketch data and the abstract visual information contained in sketches. Previous works focus on designing various network architectures and using the gold standard triplet loss to solve ZS-SBIR, but they have always encountered obstacles in enhancing model generalization and extracting abstract visual information. In contrast, this work proposes a concise and effective Triplet Bridge (TriBri) framework to clear these obstacles fundamentally. Specifically, we use InfoNCE as the core to construct cross-modal representations between images and sketches, which can increase the margin between feature clusters with different categories in the representation space and improve the generalization of the model. Furthermore, we introduce text with abstract properties into the framework to construct a ternary relationship, and the three heterogeneous gaps between text, image, and sketch modalities are connected by InfoNCE. In this process, the common abstract visual cues in both images and sketches can be captured by the feature extractor with the guiding of text abstract information. Ultimately, comprehensive experiments on three commonly used datasets (i.e., TU-Berlin, Sketchy, and QuickDraw) validate that our framework can effectively solve these obstacles in a simple yet powerful manner. Furthermore, compared to state-of-the-art methods, the proposed TriBri exhibits comprehensive performance superiority.

查看原文本刊更多论文

基于零镜头草图的图像检索的三重桥

由于草图数据的稀缺性和草图所包含的抽象视觉信息，基于零快照草图的图像检索（ZS-SBIR）一直是一个难题。以往的研究主要集中在设计各种网络架构和使用金标准三重态损失来解决ZS-SBIR，但在增强模型泛化和提取抽象视觉信息方面一直遇到障碍。相比之下，本工作提出了一个简洁有效的三重桥（TriBri）框架，从根本上清除这些障碍。具体而言，我们以InfoNCE为核心构建图像和草图之间的跨模态表示，可以增加不同类别的特征聚类在表示空间中的余量，提高模型的泛化能力。此外，我们将具有抽象属性的文本引入到框架中，构建三元关系，并通过InfoNCE将文本、图像和素描三种不同形式的间隙连接起来。在此过程中，特征提取器可以在文本抽象信息的引导下捕获图像和草图中常见的抽象视觉线索。最终，在三个常用数据集（即TU-Berlin， Sketchy和QuickDraw）上进行的综合实验验证了我们的框架可以以简单而强大的方式有效地解决这些障碍。此外，与最先进的方法相比，所提出的TriBri具有综合性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.