Dictionary Learning based Supervised Discrete Hashing for Cross-Media Retrieval

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI:10.1145/3206025.3206045

Ye Wu, Xin Luo, Xin-Shun Xu, Shanqing Guo, Yuliang Shi

{"title":"Dictionary Learning based Supervised Discrete Hashing for Cross-Media Retrieval","authors":"Ye Wu, Xin Luo, Xin-Shun Xu, Shanqing Guo, Yuliang Shi","doi":"10.1145/3206025.3206045","DOIUrl":null,"url":null,"abstract":"Hashing technique has attracted considerable attention for large-scale multimedia retrieval due to its low storage cost and fast query speed. Moreover, many hashing models have been proposed for cross-modal retrieval task. However, there are still some problems that need to be further considered. For example, a majority of them directly use linear projection matrix to project heterogeneous data into a common space, which may lead to large error as there are some heterogeneous data with semantic similarity hard to be close in latent space when linear projection is used. Besides, most existing cross-modal hashing methods use a simple pairwise similarity matrix for preserving the label information when learning. This kind of pairwise similarity cannot fully utilize the discriminative property of label information. Furthermore, most existing supervised ones try to solve a relaxed continuous optimization problem by dropping the discrete constraints, which may lead to large quantization error. To overcome these limitations, in this paper, we propose a novel cross-modal hashing method, called Dictionary Learning based Supervised Discrete Hashing (DLSDH). Specifically, it learns dictionaries and generates sparse representation for every instance, which is more suitable to be projected to a latent space. To make full use of label information, it uses cosine similarity to construct a new pairwise similarity matrix which can contain more information. Moreover, it directly learns the discrete hash codes instead of relaxing the discrete constraints. Extensive experiments are conducted on three benchmark datasets and the results demonstrate that it outperforms several state-of-the-art methods for cross-modal retrieval task.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3206025.3206045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Hashing technique has attracted considerable attention for large-scale multimedia retrieval due to its low storage cost and fast query speed. Moreover, many hashing models have been proposed for cross-modal retrieval task. However, there are still some problems that need to be further considered. For example, a majority of them directly use linear projection matrix to project heterogeneous data into a common space, which may lead to large error as there are some heterogeneous data with semantic similarity hard to be close in latent space when linear projection is used. Besides, most existing cross-modal hashing methods use a simple pairwise similarity matrix for preserving the label information when learning. This kind of pairwise similarity cannot fully utilize the discriminative property of label information. Furthermore, most existing supervised ones try to solve a relaxed continuous optimization problem by dropping the discrete constraints, which may lead to large quantization error. To overcome these limitations, in this paper, we propose a novel cross-modal hashing method, called Dictionary Learning based Supervised Discrete Hashing (DLSDH). Specifically, it learns dictionaries and generates sparse representation for every instance, which is more suitable to be projected to a latent space. To make full use of label information, it uses cosine similarity to construct a new pairwise similarity matrix which can contain more information. Moreover, it directly learns the discrete hash codes instead of relaxing the discrete constraints. Extensive experiments are conducted on three benchmark datasets and the results demonstrate that it outperforms several state-of-the-art methods for cross-modal retrieval task.

查看原文本刊更多论文

基于字典学习的有监督离散散列跨媒体检索

哈希技术以其低廉的存储成本和快速的查询速度在大规模多媒体检索中备受关注。此外，针对跨模态检索任务，已经提出了许多哈希模型。然而，仍有一些问题需要进一步考虑。例如，大多数方法直接使用线性投影矩阵将异构数据投影到公共空间，这可能会导致很大的误差，因为使用线性投影时，存在一些语义相似的异构数据在潜在空间中难以接近。此外，大多数现有的跨模态哈希方法在学习时使用一个简单的成对相似矩阵来保存标签信息。这种两两相似不能充分利用标签信息的区别性。此外，现有的大多数监督优化算法都试图通过去掉离散约束来解决一个宽松的连续优化问题，这可能导致较大的量化误差。为了克服这些限制，在本文中，我们提出了一种新的跨模态哈希方法，称为基于字典学习的监督离散哈希(DLSDH)。具体来说，它学习字典并为每个实例生成稀疏表示，更适合投影到潜在空间。为了充分利用标签信息，利用余弦相似度构造新的包含更多信息的两两相似度矩阵。此外，它直接学习离散哈希码，而不是放松离散约束。在三个基准数据集上进行了大量的实验，结果表明它优于几种最先进的跨模态检索方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量