利用相似知识和数据隐藏结构的无监督哈希对比学习

Pub Date : 2023-10-26 DOI:10.1145/3581783.3612596

Zhenpeng Song, Qinliang Su, Jiayang Chen

{"title":"利用相似知识和数据隐藏结构的无监督哈希对比学习","authors":"Zhenpeng Song, Qinliang Su, Jiayang Chen","doi":"10.1145/3581783.3612596","DOIUrl":null,"url":null,"abstract":"By noticing the superior ability of contrastive learning in representation learning, several recent works have proposed to use it to learn semantic-rich hash codes. However, due to the absence of label information, existing contrastive-based hashing methods simply follow contrastive learning by only using the augmentation of the anchor as positive, while treating all other samples in the batch as negatives, resulting in the ignorance of a large number of potential positives. Consequently, the learned hash codes tend to be distributed dispersedly in the space, making their distances unable to accurately reflect their semantic similarities. To address this issue, we propose to exploit the similarity knowledge and hidden structure of the dataset. Specifically, we first develop an intuitive approach based on self-training that comprises two main components, a pseudo-label predictor and a hash code improving module, which mutually benefit from each other by utilizing the output from one another, in conjunction with the similarity knowledge obtained from pre-trained models. Furthermore, we subjected the intuitive approach to a more rigorous probabilistic framework and propose CGHash, a probabilistic hashing model based on conditional generative models, which is theoretically more reasonable and could model the similarity knowledge and the hidden group structure more accurately. Our extensive experimental results on three image datasets demonstrate that CGHash exhibits significant superiority when compared to both the proposed intuitive approach and existing baselines. Our code is available at https://github.com/KARLSZP/CGHash.","PeriodicalId":0,"journal":{"name":"","volume":"30 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Hashing with Contrastive Learning by Exploiting Similarity Knowledge and Hidden Structure of Data\",\"authors\":\"Zhenpeng Song, Qinliang Su, Jiayang Chen\",\"doi\":\"10.1145/3581783.3612596\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By noticing the superior ability of contrastive learning in representation learning, several recent works have proposed to use it to learn semantic-rich hash codes. However, due to the absence of label information, existing contrastive-based hashing methods simply follow contrastive learning by only using the augmentation of the anchor as positive, while treating all other samples in the batch as negatives, resulting in the ignorance of a large number of potential positives. Consequently, the learned hash codes tend to be distributed dispersedly in the space, making their distances unable to accurately reflect their semantic similarities. To address this issue, we propose to exploit the similarity knowledge and hidden structure of the dataset. Specifically, we first develop an intuitive approach based on self-training that comprises two main components, a pseudo-label predictor and a hash code improving module, which mutually benefit from each other by utilizing the output from one another, in conjunction with the similarity knowledge obtained from pre-trained models. Furthermore, we subjected the intuitive approach to a more rigorous probabilistic framework and propose CGHash, a probabilistic hashing model based on conditional generative models, which is theoretically more reasonable and could model the similarity knowledge and the hidden group structure more accurately. Our extensive experimental results on three image datasets demonstrate that CGHash exhibits significant superiority when compared to both the proposed intuitive approach and existing baselines. Our code is available at https://github.com/KARLSZP/CGHash.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":\"30 1-2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3581783.3612596\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3581783.3612596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于注意到对比学习在表示学习中的优越能力，最近的一些研究提出使用对比学习来学习富含语义的哈希码。然而，由于缺乏标签信息，现有的基于对比的哈希方法只是简单地遵循对比学习，只将锚点的增强作为正，而将批中所有其他样本视为负，导致忽略了大量潜在的正。因此，学习到的哈希码往往分散分布在空间中，使得它们的距离无法准确反映它们的语义相似度。为了解决这个问题，我们提出利用数据集的相似知识和隐藏结构。具体来说，我们首先开发了一种基于自训练的直观方法，该方法包括两个主要组成部分，一个伪标签预测器和一个哈希码改进模块，它们通过利用彼此的输出相互受益，并结合从预训练模型中获得的相似性知识。在此基础上，我们提出了一种基于条件生成模型的概率哈希模型CGHash，该模型在理论上更合理，可以更准确地对相似知识和隐藏群体结构进行建模。我们在三个图像数据集上的广泛实验结果表明，与所提出的直观方法和现有基线相比，CGHash具有显着的优势。我们的代码可在https://github.com/KARLSZP/CGHash上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

Unsupervised Hashing with Contrastive Learning by Exploiting Similarity Knowledge and Hidden Structure of Data

By noticing the superior ability of contrastive learning in representation learning, several recent works have proposed to use it to learn semantic-rich hash codes. However, due to the absence of label information, existing contrastive-based hashing methods simply follow contrastive learning by only using the augmentation of the anchor as positive, while treating all other samples in the batch as negatives, resulting in the ignorance of a large number of potential positives. Consequently, the learned hash codes tend to be distributed dispersedly in the space, making their distances unable to accurately reflect their semantic similarities. To address this issue, we propose to exploit the similarity knowledge and hidden structure of the dataset. Specifically, we first develop an intuitive approach based on self-training that comprises two main components, a pseudo-label predictor and a hash code improving module, which mutually benefit from each other by utilizing the output from one another, in conjunction with the similarity knowledge obtained from pre-trained models. Furthermore, we subjected the intuitive approach to a more rigorous probabilistic framework and propose CGHash, a probabilistic hashing model based on conditional generative models, which is theoretically more reasonable and could model the similarity knowledge and the hidden group structure more accurately. Our extensive experimental results on three image datasets demonstrate that CGHash exhibits significant superiority when compared to both the proposed intuitive approach and existing baselines. Our code is available at https://github.com/KARLSZP/CGHash.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助