{"title":"减少动态图上Top-k个性化PageRank计算的重新索引","authors":"Tsuyoshi Yamashita;Naoki Matsumoto;Kunitake Kaneko","doi":"10.1109/TBDATA.2024.3524833","DOIUrl":null,"url":null,"abstract":"Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the <inline-formula><tex-math>$k$</tex-math></inline-formula> most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is <inline-formula><tex-math>$\\Theta (n + m)$</tex-math></inline-formula>, where <inline-formula><tex-math>$n$</tex-math></inline-formula> and <inline-formula><tex-math>$m$</tex-math></inline-formula> are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"1707-1719"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819623","citationCount":"0","resultStr":"{\"title\":\"Reducing Re-Indexing for Top-k Personalized PageRank Computation on Dynamic Graphs\",\"authors\":\"Tsuyoshi Yamashita;Naoki Matsumoto;Kunitake Kaneko\",\"doi\":\"10.1109/TBDATA.2024.3524833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the <inline-formula><tex-math>$k$</tex-math></inline-formula> most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is <inline-formula><tex-math>$\\\\Theta (n + m)$</tex-math></inline-formula>, where <inline-formula><tex-math>$n$</tex-math></inline-formula> and <inline-formula><tex-math>$m$</tex-math></inline-formula> are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 4\",\"pages\":\"1707-1719\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819623\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10819623/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819623/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Reducing Re-Indexing for Top-k Personalized PageRank Computation on Dynamic Graphs
Top-k Personalized PageRank (PPR) is a graph analysis method used to determine the $k$ most important nodes with respect to a source node. To realize fast Top-k PPR computation, indexing for each node is effective. When we apply the index-based Top-k PPR methods to dynamic graphs, the index becomes stale with edge updates, and index correction is required. Although the existing methods perform index correction for every update to guarantee Top-k PPR accuracy, they involve heavy re-indexing computation or significant memory overhead. This paper proposes a method that achieves comparable accuracy to guaranteed methods while significantly reducing re-indexing by focusing on the fact that index references are concentrated on the nodes whose index is unlikely to change due to edge updates. In particular, our method omits re-indexing as long as we achieve comparable accuracy. Furthermore, our method involves the minimum memory overhead among the existing index-based methods. The space complexity of the index is $\Theta (n + m)$, where $n$ and $m$ are the number of nodes and edges of the graph, respectively. The evaluation results using real-world datasets show that our method achieves more than 0.999 Normalized Discounted Cumulative Gain until 20% of edges are updated from index generation.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.