{"title":"Triplet Bridge for Zero-Shot Sketch-Based Image Retrieval","authors":"Jiahao Zheng;Yu Tang;Dapeng Wu","doi":"10.1109/TETCI.2024.3502430","DOIUrl":null,"url":null,"abstract":"Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) has always been a hard nut to crack due to the scarcity of sketch data and the abstract visual information contained in sketches. Previous works focus on designing various network architectures and using the gold standard triplet loss to solve ZS-SBIR, but they have always encountered obstacles in enhancing model generalization and extracting abstract visual information. In contrast, this work proposes a concise and effective Triplet Bridge (TriBri) framework to clear these obstacles fundamentally. Specifically, we use InfoNCE as the core to construct cross-modal representations between images and sketches, which can increase the margin between feature clusters with different categories in the representation space and improve the generalization of the model. Furthermore, we introduce text with abstract properties into the framework to construct a ternary relationship, and the three heterogeneous gaps between text, image, and sketch modalities are connected by InfoNCE. In this process, the common abstract visual cues in both images and sketches can be captured by the feature extractor with the guiding of text abstract information. Ultimately, comprehensive experiments on three commonly used datasets (i.e., TU-Berlin, Sketchy, and QuickDraw) validate that our framework can effectively solve these obstacles in a simple yet powerful manner. Furthermore, compared to state-of-the-art methods, the proposed TriBri exhibits comprehensive performance superiority.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"2014-2025"},"PeriodicalIF":5.3000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10767753/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) has always been a hard nut to crack due to the scarcity of sketch data and the abstract visual information contained in sketches. Previous works focus on designing various network architectures and using the gold standard triplet loss to solve ZS-SBIR, but they have always encountered obstacles in enhancing model generalization and extracting abstract visual information. In contrast, this work proposes a concise and effective Triplet Bridge (TriBri) framework to clear these obstacles fundamentally. Specifically, we use InfoNCE as the core to construct cross-modal representations between images and sketches, which can increase the margin between feature clusters with different categories in the representation space and improve the generalization of the model. Furthermore, we introduce text with abstract properties into the framework to construct a ternary relationship, and the three heterogeneous gaps between text, image, and sketch modalities are connected by InfoNCE. In this process, the common abstract visual cues in both images and sketches can be captured by the feature extractor with the guiding of text abstract information. Ultimately, comprehensive experiments on three commonly used datasets (i.e., TU-Berlin, Sketchy, and QuickDraw) validate that our framework can effectively solve these obstacles in a simple yet powerful manner. Furthermore, compared to state-of-the-art methods, the proposed TriBri exhibits comprehensive performance superiority.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.