Generalized Conditional Similarity Learning via Semantic Matching

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-02-13 DOI:10.1109/TPAMI.2025.3535730

Yi Shi;Rui-Xiang Li;Le Gan;De-Chuan Zhan;Han-Jia Ye

{"title":"Generalized Conditional Similarity Learning via Semantic Matching","authors":"Yi Shi;Rui-Xiang Li;Le Gan;De-Chuan Zhan;Han-Jia Ye","doi":"10.1109/TPAMI.2025.3535730","DOIUrl":null,"url":null,"abstract":"The inherent complexity of image semantics engenders a fascinating variability in relationships between images. For instance, under a certain condition, two images may demonstrate similarity, while under different circumstances, the same pair could exhibit absolute dissimilarity. A singular feature space is therefore insufficient for capturing the nuanced semantic relationships that exist between samples. Conditional Similarity Learning (CSL) aims to address this gap by learning multiple, distinct feature spaces. Existing approaches in CSL often fail to capture the intricate similarity relationships between samples across different semantic conditions, particularly in weakly-supervised settings where condition labels are absent during training. To address this limitation, we introduce <bold>Distance <bold>Induced <bold>Semantic <bold>COndition <bold>VERification <bold>NETwork (<sc>DiscoverNet), a unified framework designed to cater to a range of CSL scenarios— supervised CSL (sCSL), weakly-supervised CSL (wsCSL), and semi-supervised CSL (ssCSL). In addition to traditional linear projections, we also introduce a prompt learning technique utilizing transformer encoding layer to create diverse embedding spaces. Our framework incorporates a Condition Match Module (CMM) that dynamically matches different training triplets with corresponding embedding spaces, adapting to varying levels of supervision. We also shed light on existing evaluation biases in wsCSL and introduce two novel criteria for a more robust evaluation. Through extensive experiments and visualizations on benchmark datasets such as UT-Zappos-50 k and Celeb-A, we substantiate the efficacy and interpretability of <sc>DiscoverNet.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3847-3862"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10887026/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The inherent complexity of image semantics engenders a fascinating variability in relationships between images. For instance, under a certain condition, two images may demonstrate similarity, while under different circumstances, the same pair could exhibit absolute dissimilarity. A singular feature space is therefore insufficient for capturing the nuanced semantic relationships that exist between samples. Conditional Similarity Learning (CSL) aims to address this gap by learning multiple, distinct feature spaces. Existing approaches in CSL often fail to capture the intricate similarity relationships between samples across different semantic conditions, particularly in weakly-supervised settings where condition labels are absent during training. To address this limitation, we introduce Distance Induced Semantic COndition VERification NETwork (DiscoverNet), a unified framework designed to cater to a range of CSL scenarios— supervised CSL (sCSL), weakly-supervised CSL (wsCSL), and semi-supervised CSL (ssCSL). In addition to traditional linear projections, we also introduce a prompt learning technique utilizing transformer encoding layer to create diverse embedding spaces. Our framework incorporates a Condition Match Module (CMM) that dynamically matches different training triplets with corresponding embedding spaces, adapting to varying levels of supervision. We also shed light on existing evaluation biases in wsCSL and introduce two novel criteria for a more robust evaluation. Through extensive experiments and visualizations on benchmark datasets such as UT-Zappos-50 k and Celeb-A, we substantiate the efficacy and interpretability of DiscoverNet.

查看原文本刊更多论文

基于语义匹配的广义条件相似学习

图像语义固有的复杂性使图像之间的关系具有令人着迷的可变性。例如，在一定条件下，两幅图像可能表现出相似性，而在不同的情况下，同一对图像可能表现出绝对的不同。因此，单一特征空间不足以捕获样本之间存在的细微语义关系。条件相似学习（CSL）旨在通过学习多个不同的特征空间来解决这一差距。CSL中现有的方法往往无法捕获不同语义条件下样本之间复杂的相似关系，特别是在训练过程中缺乏条件标签的弱监督设置中。为了解决这一限制，我们引入了距离诱导语义条件验证网络（DiscoverNet），这是一个统一的框架，旨在满足一系列CSL场景-监督CSL (sCSL)，弱监督CSL （wsCSL）和半监督CSL （ssCSL）。除了传统的线性投影外，我们还引入了一种利用变换编码层来创建不同嵌入空间的快速学习技术。我们的框架包含一个条件匹配模块（CMM），它可以动态匹配不同的训练三元组和相应的嵌入空间，以适应不同的监督水平。我们还阐明了wsCSL中现有的评估偏差，并引入了两个新的标准来进行更稳健的评估。通过在ut - zappos - 50k和Celeb-A等基准数据集上进行大量实验和可视化，我们证实了DiscoverNet的有效性和可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量