减少动态图上Top-k个性化PageRank计算的重新索引

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2025-01-01 DOI:10.1109/TBDATA.2024.3524833

Tsuyoshi Yamashita;Naoki Matsumoto;Kunitake Kaneko

{"title":"减少动态图上Top-k个性化PageRank计算的重新索引","authors":"Tsuyoshi Yamashita;Naoki Matsumoto;Kunitake Kaneko","doi":"10.1109/TBDATA.2024.3524833","DOIUrl":null,"url":null,"abstract":"Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the <inline-formula><tex-math>$k$</tex-math></inline-formula> most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is <inline-formula><tex-math>$\\Theta (n + m)$</tex-math></inline-formula>, where <inline-formula><tex-math>$n$</tex-math></inline-formula> and <inline-formula><tex-math>$m$</tex-math></inline-formula> are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1707-1719"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819623","citationCount":"0","resultStr":"{\"title\":\"Reducing Re-Indexing for Top-k Personalized PageRank Computation on Dynamic Graphs\",\"authors\":\"Tsuyoshi Yamashita;Naoki Matsumoto;Kunitake Kaneko\",\"doi\":\"10.1109/TBDATA.2024.3524833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the <inline-formula><tex-math>$k$</tex-math></inline-formula> most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is <inline-formula><tex-math>$\\\\Theta (n + m)$</tex-math></inline-formula>, where <inline-formula><tex-math>$n$</tex-math></inline-formula> and <inline-formula><tex-math>$m$</tex-math></inline-formula> are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 4\",\"pages\":\"1707-1719\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819623\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10819623/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819623/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

Top-k personalpagerank （PPR）是一种图分析方法，用于确定相对于源节点最重要的k个节点。为了实现快速的Top-k PPR计算，对每个节点进行索引是有效的。将基于索引的Top-k PPR方法应用于动态图时，由于边缘更新，索引变得陈旧，需要进行索引修正。尽管现有方法对每次更新执行索引更正以保证Top-k PPR的准确性，但它们涉及大量的重新索引计算或显著的内存开销。本文提出了一种方法，通过关注索引引用集中在不太可能因边缘更新而改变索引的节点上这一事实，可以实现与保证方法相当的准确性，同时显着减少重新索引。特别是，我们的方法省略了重新索引，只要我们达到相当的精度。此外，在现有的基于索引的方法中，我们的方法涉及的内存开销最小。索引的空间复杂度为$\Theta (n + m)$，其中$n$和$m$分别为图的节点数和边数。使用真实数据集的评估结果表明，我们的方法达到了0.999以上的归一化贴现累积增益，直到20%的边从索引生成更新。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reducing Re-Indexing for Top-k Personalized PageRank Computation on Dynamic Graphs

Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the

$k$

most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is

$\Theta (n + m)$

, where

$n$

and

$m$

are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.