Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries

ACM Transactions on Database Systems (TODS) Pub Date : 2019-08-28 DOI:10.1145/3360902

Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Y. Yang, N. Tang

{"title":"Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries","authors":"Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Y. Yang, N. Tang","doi":"10.1145/3360902","DOIUrl":null,"url":null,"abstract":"Given a graph G, a source node s, and a target node t, the personalized PageRank (PPR) of t with respect to s is the probability that a random walk starting from s terminates at t. An important variant of the PPR query is single-source PPR (SSPPR), which enumerates all nodes in G and returns the top-k nodes with the highest PPR values with respect to a given source s. PPR in general and SSPPR in particular have important applications in web search and social networks, e.g., in Twitter’s Who-To-Follow recommendation service. However, PPR computation is known to be expensive on large graphs and resistant to indexing. Consequently, previous solutions either use heuristics, which do not guarantee result quality, or rely on the strong computing power of modern data centers, which is costly. Motivated by this, we propose effective index-free and index-based algorithms for approximate PPR processing, with rigorous guarantees on result quality. We first present FORA, an approximate SSPPR solution that combines two existing methods—Forward Push (which is fast but does not guarantee quality) and Monte Carlo Random Walk (accurate but slow)—in a simple and yet non-trivial way, leading to both high accuracy and efficiency. Further, FORA includes a simple and effective indexing scheme, as well as a module for top-k selection with high pruning power. Extensive experiments demonstrate that the proposed solutions are orders of magnitude more efficient than their respective competitors. Notably, on a billion-edge Twitter dataset, FORA answers a top-500 approximate SSPPR query within 1s, using a single commodity server.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"63 1","pages":"1 - 37"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems (TODS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3360902","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

Given a graph G, a source node s, and a target node t, the personalized PageRank (PPR) of t with respect to s is the probability that a random walk starting from s terminates at t. An important variant of the PPR query is single-source PPR (SSPPR), which enumerates all nodes in G and returns the top-k nodes with the highest PPR values with respect to a given source s. PPR in general and SSPPR in particular have important applications in web search and social networks, e.g., in Twitter’s Who-To-Follow recommendation service. However, PPR computation is known to be expensive on large graphs and resistant to indexing. Consequently, previous solutions either use heuristics, which do not guarantee result quality, or rely on the strong computing power of modern data centers, which is costly. Motivated by this, we propose effective index-free and index-based algorithms for approximate PPR processing, with rigorous guarantees on result quality. We first present FORA, an approximate SSPPR solution that combines two existing methods—Forward Push (which is fast but does not guarantee quality) and Monte Carlo Random Walk (accurate but slow)—in a simple and yet non-trivial way, leading to both high accuracy and efficiency. Further, FORA includes a simple and effective indexing scheme, as well as a module for top-k selection with high pruning power. Extensive experiments demonstrate that the proposed solutions are orders of magnitude more efficient than their respective competitors. Notably, on a billion-edge Twitter dataset, FORA answers a top-500 approximate SSPPR query within 1s, using a single commodity server.

查看原文本刊更多论文

近似单源个性化PageRank查询的高效算法

给定一个图G,年代,源节点和目标节点t, t的个性化网页排名(PPR)对s的概率是随机漫步从s t终止。PPR查询是单一的一个重要变体PPR (SSPPR),其中列举了在G的所有节点,并返回top-k PPR最高的节点值对于一个给定源。一般PPR特别是SSPPR在网络搜索和社交网络有着重要的应用,例如,在Twitter的Who-To-Follow推荐服务中。然而，众所周知，PPR计算在大型图上是昂贵的，并且难以建立索引。因此，以前的解决方案要么使用启发式方法，但不能保证结果质量，要么依赖于现代数据中心的强大计算能力，这是昂贵的。基于此，我们提出了有效的无索引和基于索引的近似PPR处理算法，并严格保证结果质量。我们首先提出了FORA，一种近似的SSPPR解决方案，它结合了两种现有的方法-向前推进(快速但不保证质量)和蒙特卡罗随机漫步(准确但缓慢)-以一种简单而非平凡的方式，实现了高精度和高效率。此外，FORA还包括一个简单有效的索引方案，以及一个具有高修剪能力的top-k选择模块。大量的实验表明，所提出的解决方案比各自的竞争对手效率高几个数量级。值得注意的是，在十亿边缘的Twitter数据集上，FORA使用单个商品服务器，在15秒内回答前500强的近似SSPPR查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Database Systems (TODS)

自引率

0.00%

发文量