Fast incremental SimRank on link-evolving graphs

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI:10.1109/ICDE.2014.6816660

Weiren Yu, Xuemin Lin, W. Zhang

{"title":"Fast incremental SimRank on link-evolving graphs","authors":"Weiren Yu, Xuemin Lin, W. Zhang","doi":"10.1109/ICDE.2014.6816660","DOIUrl":null,"url":null,"abstract":"SimRank is an arresting measure of node-pair similarity based on hyperlinks. It iteratively follows the concept that 2 nodes are similar if they are referenced by similar nodes. Real graphs are often large, and links constantly evolve with small changes over time. This paper considers fast incremental computations of SimRank on link-evolving graphs. The prior approach [12] to this issue factorizes the graph via a singular value decomposition (SVD) first, and then incrementally maintains this factorization for link updates at the expense of exactness. Consequently, all node-pair similarities are estimated in O(r4n2) time on a graph of n nodes, where r is the target rank of the low-rank approximation, which is not negligibly small in practice. In this paper, we propose a novel fast incremental paradigm. (1) We characterize the SimRank update matrix ΔS, in response to every link update, via a rank-one Sylvester matrix equation. By virtue of this, we devise a fast incremental algorithm computing similarities of n2 node-pairs in O(Kn2) time for K iterations. (2) We also propose an effective pruning technique capturing the “affected areas” of ΔS to skip unnecessary computations, without loss of exactness. This can further accelerate the incremental SimRank computation to O(K(nd+|AFF|)) time, where d is the average in-degree of the old graph, and |AFF| (≤ n2) is the size of “affected areas” in ΔS, and in practice, |AFF| ≪ n2. Our empirical evaluations verify that our algorithm (a) outperforms the best known link-update algorithm [12], and (b) runs much faster than its batch counterpart when link updates are small.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2014.6816660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

SimRank is an arresting measure of node-pair similarity based on hyperlinks. It iteratively follows the concept that 2 nodes are similar if they are referenced by similar nodes. Real graphs are often large, and links constantly evolve with small changes over time. This paper considers fast incremental computations of SimRank on link-evolving graphs. The prior approach [12] to this issue factorizes the graph via a singular value decomposition (SVD) first, and then incrementally maintains this factorization for link updates at the expense of exactness. Consequently, all node-pair similarities are estimated in O(r4n2) time on a graph of n nodes, where r is the target rank of the low-rank approximation, which is not negligibly small in practice. In this paper, we propose a novel fast incremental paradigm. (1) We characterize the SimRank update matrix ΔS, in response to every link update, via a rank-one Sylvester matrix equation. By virtue of this, we devise a fast incremental algorithm computing similarities of n2 node-pairs in O(Kn2) time for K iterations. (2) We also propose an effective pruning technique capturing the “affected areas” of ΔS to skip unnecessary computations, without loss of exactness. This can further accelerate the incremental SimRank computation to O(K(nd+|AFF|)) time, where d is the average in-degree of the old graph, and |AFF| (≤ n2) is the size of “affected areas” in ΔS, and in practice, |AFF| ≪ n2. Our empirical evaluations verify that our algorithm (a) outperforms the best known link-update algorithm [12], and (b) runs much faster than its batch counterpart when link updates are small.

查看原文本刊更多论文

链接演化图上的快速增量simmrank

simmrank是一种基于超链接的节点对相似性度量。它迭代地遵循这样的概念:如果两个节点被相似的节点引用，则它们是相似的。真实的图形通常很大，并且链接会随着时间的推移而不断变化。本文研究了链路演化图上simmrank的快速增量计算。先前的方法[12]首先通过奇异值分解(SVD)对图进行分解，然后以牺牲准确性为代价，增量地维护链接更新的这种分解。因此，在一个有n个节点的图上，所有的节点对相似度在O(r4n2)时间内估计出来，其中r是低秩近似的目标秩，在实践中是不可忽略的小。在本文中，我们提出了一个新的快速增量范式。(1)我们通过一个排名第一的Sylvester矩阵方程来表征simmrank更新矩阵ΔS，以响应每个链接更新。据此，我们设计了一种快速增量算法，在K次迭代的O(Kn2)时间内计算n2个节点对的相似度。(2)我们还提出了一种有效的修剪技术，捕获ΔS的“受影响区域”，以跳过不必要的计算，而不损失准确性。这可以进一步将simmrank的增量计算加速到O(K(nd+|AFF|))时间，其中d是旧图的平均进阶，而|AFF|(≤n2)是ΔS中“受影响区域”的大小，在实际中，|AFF|≪n2。我们的经验评估验证了我们的算法(a)优于最著名的链路更新算法[12]，并且(b)在链路更新较小时比其批量对应算法运行得快得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 30th International Conference on Data Engineering

自引率

0.00%

发文量