Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI:10.1145/3206025.3206066

Kevin Joslyn, Kai Li, K. Hua

{"title":"Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing","authors":"Kevin Joslyn, Kai Li, K. Hua","doi":"10.1145/3206025.3206066","DOIUrl":null,"url":null,"abstract":"Cross-modal hashing has become a popular research topic in recent years due to the efficiency of storing and retrieving high-dimensional multimodal data represented by compact binary codes. While most cross-modal hash functions use binary space partitioning functions (e.g. the sign function), our method uses ranking-based hashing, which is based on numerically stable and scale-invariant rank correlation measures. In this paper, we propose a novel deep learning architecture called Deep De-correlated Subspace Ranking Hashing (DDSRH) that uses feature-ranking methods to determine the hash codes for the image and text modalities in a common hamming space. Specifically, DDSRH learns a set of de-correlated nonlinear subspaces on which to project the original features, so that the hash code can be determined by the relative ordering of projected feature values in a given optimized subspace. The network relies upon a pre-trained deep feature learning network for each modality, and a hashing network responsible for optimizing the hash codes based on the known similarity of the training image-text pairs. Our proposed method includes both architectural and mathematical techniques designed specifically for ranking-based hashing in order to achieve de-correlation between the bits, bit balancing, and quantization. Finally, through extensive experimental studies on two widely-used multimodal datasets, we show that the combination of these techniques can achieve state-of the-art performance on several benchmarks.","PeriodicalId":224132,"journal":{"name":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3206025.3206066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Cross-modal hashing has become a popular research topic in recent years due to the efficiency of storing and retrieving high-dimensional multimodal data represented by compact binary codes. While most cross-modal hash functions use binary space partitioning functions (e.g. the sign function), our method uses ranking-based hashing, which is based on numerically stable and scale-invariant rank correlation measures. In this paper, we propose a novel deep learning architecture called Deep De-correlated Subspace Ranking Hashing (DDSRH) that uses feature-ranking methods to determine the hash codes for the image and text modalities in a common hamming space. Specifically, DDSRH learns a set of de-correlated nonlinear subspaces on which to project the original features, so that the hash code can be determined by the relative ordering of projected feature values in a given optimized subspace. The network relies upon a pre-trained deep feature learning network for each modality, and a hashing network responsible for optimizing the hash codes based on the known similarity of the training image-text pairs. Our proposed method includes both architectural and mathematical techniques designed specifically for ranking-based hashing in order to achieve de-correlation between the bits, bit balancing, and quantization. Finally, through extensive experimental studies on two widely-used multimodal datasets, we show that the combination of these techniques can achieve state-of the-art performance on several benchmarks.

查看原文本刊更多论文

基于深度去相关子空间排序哈希的跨模态检索

跨模态哈希算法由于能够高效地存储和检索由紧凑二进制码表示的高维多模态数据而成为近年来研究的热点。虽然大多数跨模态哈希函数使用二进制空间划分函数(例如符号函数)，但我们的方法使用基于排名的哈希，它基于数值稳定和尺度不变的排名相关度量。在本文中，我们提出了一种新的深度学习架构，称为深度去相关子空间排序哈希(DDSRH)，它使用特征排序方法来确定公共汉明空间中图像和文本模式的哈希码。具体来说，DDSRH学习一组去相关的非线性子空间，在这些子空间上投影原始特征，从而可以通过给定优化子空间中投影特征值的相对顺序来确定哈希码。该网络依赖于每个模态的预训练深度特征学习网络，以及负责基于训练图像-文本对的已知相似性优化哈希码的哈希网络。我们提出的方法包括专门为基于排名的散列设计的架构和数学技术，以实现位之间的去相关，位平衡和量化。最后，通过对两个广泛使用的多模态数据集的广泛实验研究，我们表明这些技术的组合可以在几个基准上实现最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量