SHRec: Scalable Holistic Recommendation

Ahmed M. Aly, M. Hammad, Amr Ahmed
{"title":"SHRec: Scalable Holistic Recommendation","authors":"Ahmed M. Aly, M. Hammad, Amr Ahmed","doi":"10.1145/3085504.3085523","DOIUrl":null,"url":null,"abstract":"The problem of recommending items to users is of high practical importance. For instance, many web services try to find relevant recommendations for the users, e.g., finding relevant movies, social-media friends, restaurants, shopping items, etc. The expansion of the Web and the ever-growing number of people who use web services render the problem of recommendation challenging. The Locality Sensitive Hashing (LSH, for short) is the most known scalable technique for nearest-neighbor search in high dimensional data, and hence the LSH is widely used in most industrial recommendation systems. This paper presents an implementation of the LSH using Google's MapReduce engine. We apply the LSH to a real case study at Google, where we recommend for each web-host a set of outlinks based on the outlink similarity amongst the web-hosts. We identify some performance limitations of the LSH that occur due to specific properties in the data, and that become significant when the scale of the data is large. Furthermore, we present SHRec, a novel technique for scalable recommendation that addresses these performance limitations. Based on real deployment of both SHRec and LSH on Google's infrastructure, and using real data of the crawled Web at Google, where a sample host-level graph of 1.5 Billion web-hosts is extracted, we demonstrate that SHRec is more scalable than LSH. In particular, we show that SHRec is one order of magnitude faster than LSH while achieving better recommendation quality.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3085504.3085523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of recommending items to users is of high practical importance. For instance, many web services try to find relevant recommendations for the users, e.g., finding relevant movies, social-media friends, restaurants, shopping items, etc. The expansion of the Web and the ever-growing number of people who use web services render the problem of recommendation challenging. The Locality Sensitive Hashing (LSH, for short) is the most known scalable technique for nearest-neighbor search in high dimensional data, and hence the LSH is widely used in most industrial recommendation systems. This paper presents an implementation of the LSH using Google's MapReduce engine. We apply the LSH to a real case study at Google, where we recommend for each web-host a set of outlinks based on the outlink similarity amongst the web-hosts. We identify some performance limitations of the LSH that occur due to specific properties in the data, and that become significant when the scale of the data is large. Furthermore, we present SHRec, a novel technique for scalable recommendation that addresses these performance limitations. Based on real deployment of both SHRec and LSH on Google's infrastructure, and using real data of the crawled Web at Google, where a sample host-level graph of 1.5 Billion web-hosts is extracted, we demonstrate that SHRec is more scalable than LSH. In particular, we show that SHRec is one order of magnitude faster than LSH while achieving better recommendation quality.
SHRec:可扩展的整体推荐
向用户推荐商品的问题具有很高的实际重要性。例如,许多web服务试图为用户找到相关的推荐,例如,寻找相关的电影、社交媒体朋友、餐馆、购物项目等。Web的扩展和使用Web服务的人数的不断增长使得推荐问题具有挑战性。局域敏感散列(Locality Sensitive hash,简称LSH)是高维数据中最近邻搜索的可扩展技术,因此LSH被广泛应用于大多数工业推荐系统中。本文介绍了使用Google的MapReduce引擎实现LSH的方法。我们将LSH应用到谷歌的一个真实案例研究中,在那里我们根据网络主机之间的外链相似性为每个网络主机推荐一组外链。我们确定了LSH的一些性能限制,这些限制是由于数据中的特定属性造成的,当数据规模很大时,这些限制就会变得很重要。此外,我们提出了一种新的可扩展推荐技术SHRec,以解决这些性能限制。基于SHRec和LSH在Google基础设施上的实际部署,并使用Google抓取的Web的真实数据,其中提取了15亿个Web主机的示例主机级图,我们证明了SHRec比LSH更具可扩展性。特别是,我们表明SHRec比LSH快一个数量级,同时获得更好的推荐质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信