Efficient similarity computation for collaborative filtering in dynamic environments

Proceedings of the 13th ACM Conference on Recommender Systems Pub Date : 2019-09-10 DOI:10.1145/3298689.3347017

Olivier Jeunen, Koen Verstrepen, Bart Goethals

{"title":"Efficient similarity computation for collaborative filtering in dynamic environments","authors":"Olivier Jeunen, Koen Verstrepen, Bart Goethals","doi":"10.1145/3298689.3347017","DOIUrl":null,"url":null,"abstract":"The problem of computing all pairwise similarities in a large collection of vectors is a well-known and common data mining task. As the number and dimensionality of these vectors keeps increasing, however, currently existing approaches are often unable to meet the strict efficiency requirements imposed by the environments they need to perform in. Real-time neighbourhood-based collaborative filtering (CF) is one example of such an environment in which performance is critical. In this work, we present a novel algorithm for efficient and exact similarity computation between sparse, high-dimensional vectors. Our approach exploits the sparsity that is inherent to implicit feedback data-streams, entailing significant gains compared to other methods. Furthermore, as our model learns incrementally, it is naturally suited for dynamic real-time CF environments. We propose a MapReduce-inspired parallellisation procedure along with our method, and show how even more speed-up can be achieved. Additionally, in many real-world systems, many items are actually not recommendable at any given time, due to recency, stock, seasonality, or enforced business rules. We exploit this fact to further improve the computational efficiency of our approach. Experimental evaluation on both real-world and publicly available datasets shows that our approach scales up to millions of processed user-item interactions per second, and well advances the state-of-the-art.","PeriodicalId":215384,"journal":{"name":"Proceedings of the 13th ACM Conference on Recommender Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3298689.3347017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

The problem of computing all pairwise similarities in a large collection of vectors is a well-known and common data mining task. As the number and dimensionality of these vectors keeps increasing, however, currently existing approaches are often unable to meet the strict efficiency requirements imposed by the environments they need to perform in. Real-time neighbourhood-based collaborative filtering (CF) is one example of such an environment in which performance is critical. In this work, we present a novel algorithm for efficient and exact similarity computation between sparse, high-dimensional vectors. Our approach exploits the sparsity that is inherent to implicit feedback data-streams, entailing significant gains compared to other methods. Furthermore, as our model learns incrementally, it is naturally suited for dynamic real-time CF environments. We propose a MapReduce-inspired parallellisation procedure along with our method, and show how even more speed-up can be achieved. Additionally, in many real-world systems, many items are actually not recommendable at any given time, due to recency, stock, seasonality, or enforced business rules. We exploit this fact to further improve the computational efficiency of our approach. Experimental evaluation on both real-world and publicly available datasets shows that our approach scales up to millions of processed user-item interactions per second, and well advances the state-of-the-art.

查看原文本刊更多论文

动态环境下协同过滤的高效相似度计算

在大量向量集合中计算所有成对相似度的问题是一个众所周知且常见的数据挖掘任务。然而，随着这些向量的数量和维度不断增加，目前现有的方法往往无法满足它们所需要执行的环境所施加的严格效率要求。基于实时邻域的协同过滤(CF)就是这种环境中的一个例子，在这种环境中，性能至关重要。在这项工作中，我们提出了一种新的算法，用于在稀疏的高维向量之间高效而精确的相似性计算。我们的方法利用了隐式反馈数据流固有的稀疏性，与其他方法相比，它带来了显著的收益。此外，随着我们的模型逐渐学习，它自然适合于动态实时CF环境。我们提出了一个受mapreduce启发的并行化过程和我们的方法，并展示了如何实现更多的加速。此外，在许多现实世界的系统中，由于最近、库存、季节性或强制的业务规则，许多项目实际上在任何给定的时间都是不推荐的。我们利用这一事实来进一步提高我们方法的计算效率。对真实世界和公开可用数据集的实验评估表明，我们的方法可以扩展到每秒处理数百万个用户-项目交互，并且很好地推进了最先进的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 13th ACM Conference on Recommender Systems

自引率

0.00%

发文量