Comparing apples to oranges: a scalable solution with heterogeneous hashing

Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, Shiqiang Yang
{"title":"Comparing apples to oranges: a scalable solution with heterogeneous hashing","authors":"Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, Shiqiang Yang","doi":"10.1145/2487575.2487668","DOIUrl":null,"url":null,"abstract":"Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment, i.e., the data entities to be indexed are of the same type. Realizing that heterogeneous entities and relationships are also ubiquitous in the real world applications, there is an emerging need to retrieve and search similar or relevant data entities from multiple heterogeneous domains, e.g., recommending relevant posts and images to a certain Facebook user. In this paper, we address the problem of ``comparing apples to oranges'' under the large scale setting. Specifically, we propose a novel Relation-aware Heterogeneous Hashing (RaHH), which provides a general framework for generating hash codes of data entities sitting in multiple heterogeneous domains. Unlike some existing hashing methods that map heterogeneous data in a common Hamming space, the RaHH approach constructs a Hamming space for each type of data entities, and learns optimal mappings between them simultaneously. This makes the learned hash codes flexibly cope with the characteristics of different data domains. Moreover, the RaHH framework encodes both homogeneous and heterogeneous relationships between the data entities to design hash functions with improved accuracy. To validate the proposed RaHH method, we conduct extensive evaluations on two large datasets; one is crawled from a popular social media sites, Tencent Weibo, and the other is an open dataset of Flickr(NUS-WIDE). The experimental results clearly demonstrate that the RaHH outperforms several state-of-the-art hashing methods with significant performance gains.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 65

Abstract

Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment, i.e., the data entities to be indexed are of the same type. Realizing that heterogeneous entities and relationships are also ubiquitous in the real world applications, there is an emerging need to retrieve and search similar or relevant data entities from multiple heterogeneous domains, e.g., recommending relevant posts and images to a certain Facebook user. In this paper, we address the problem of ``comparing apples to oranges'' under the large scale setting. Specifically, we propose a novel Relation-aware Heterogeneous Hashing (RaHH), which provides a general framework for generating hash codes of data entities sitting in multiple heterogeneous domains. Unlike some existing hashing methods that map heterogeneous data in a common Hamming space, the RaHH approach constructs a Hamming space for each type of data entities, and learns optimal mappings between them simultaneously. This makes the learned hash codes flexibly cope with the characteristics of different data domains. Moreover, the RaHH framework encodes both homogeneous and heterogeneous relationships between the data entities to design hash functions with improved accuracy. To validate the proposed RaHH method, we conduct extensive evaluations on two large datasets; one is crawled from a popular social media sites, Tencent Weibo, and the other is an open dataset of Flickr(NUS-WIDE). The experimental results clearly demonstrate that the RaHH outperforms several state-of-the-art hashing methods with significant performance gains.
比较苹果和橘子:一个具有异构哈希的可扩展解决方案
虽然哈希技术在大规模相似度搜索问题中得到了广泛应用,但现有的设计最优哈希函数的方法大多侧重于同构相似度评估,即要索引的数据实体是同一类型的。意识到异构实体和关系在现实世界的应用程序中也无处不在,从多个异构领域中检索和搜索相似或相关的数据实体是一个新兴的需求,例如,向某个Facebook用户推荐相关的帖子和图片。在本文中,我们解决了大规模设置下的“比较苹果和橘子”问题。具体来说,我们提出了一种新的关系感知异构哈希(RaHH),它为生成位于多个异构域中的数据实体的哈希码提供了一个通用框架。与现有的一些将异构数据映射到公共汉明空间的散列方法不同,RaHH方法为每种类型的数据实体构建一个汉明空间,并同时学习它们之间的最佳映射。这使得学习到的哈希码能够灵活地应对不同数据域的特点。此外,RaHH框架对数据实体之间的同构和异构关系进行编码,以提高散列函数的精度。为了验证提出的RaHH方法,我们在两个大型数据集上进行了广泛的评估;一个是从流行的社交媒体网站腾讯微博中抓取的,另一个是Flickr(NUS-WIDE)的开放数据集。实验结果清楚地表明,RaHH比几种最先进的散列方法性能更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信