{"title":"On Efficient Single-Source Personalized PageRank Computation in Online Social Networks","authors":"Victor Junqiu Wei;Di Jiang;Jason Chen Zhang","doi":"10.1109/TKDE.2025.3551751","DOIUrl":null,"url":null,"abstract":"The Single-Source Personalized PageRank (SSPPR) problem is widely used in information retrieval and recommendation systems. Traditional algorithms assume full knowledge of the network, making them inapplicable to online social networks (OSNs), where the topology is unknown, and users can only explore the network step by step via APIs. The only feasible approach for SSPPR in OSNs is Monte Carlo (MC) simulation, but traditional MC methods rely on static sampling, which lacks flexibility, delays feedback, and overestimates the number of required random walks. To address these limitations, we propose PANDA (Single-Source Personalized PageRank on OSNs with Rademacher Average), a progressive sampling algorithm. PANDA iteratively samples random walks in batches, estimating accuracy dynamically using Rademacher Average from statistical learning theory. This data-dependent approach allows for early termination once the desired accuracy is met. Additionally, PANDA features a dynamic sampling schedule to optimize efficiency. Empirical studies show that PANDA significantly outperforms existing methods, achieving the same accuracy with far greater efficiency.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3598-3612"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10937368/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The Single-Source Personalized PageRank (SSPPR) problem is widely used in information retrieval and recommendation systems. Traditional algorithms assume full knowledge of the network, making them inapplicable to online social networks (OSNs), where the topology is unknown, and users can only explore the network step by step via APIs. The only feasible approach for SSPPR in OSNs is Monte Carlo (MC) simulation, but traditional MC methods rely on static sampling, which lacks flexibility, delays feedback, and overestimates the number of required random walks. To address these limitations, we propose PANDA (Single-Source Personalized PageRank on OSNs with Rademacher Average), a progressive sampling algorithm. PANDA iteratively samples random walks in batches, estimating accuracy dynamically using Rademacher Average from statistical learning theory. This data-dependent approach allows for early termination once the desired accuracy is met. Additionally, PANDA features a dynamic sampling schedule to optimize efficiency. Empirical studies show that PANDA significantly outperforms existing methods, achieving the same accuracy with far greater efficiency.
单源个性化PageRank (SSPPR)问题在信息检索和推荐系统中得到了广泛的应用。传统算法假设对网络有充分的了解,不适用于网络拓扑未知的在线社交网络(online social network, osn),用户只能通过api逐步探索网络。对于SSPPR在OSNs中唯一可行的方法是蒙特卡罗(Monte Carlo, MC)模拟,但传统的MC方法依赖于静态采样,缺乏灵活性,延迟反馈,并且高估了所需随机游动的数量。为了解决这些限制,我们提出了一种渐进式采样算法PANDA (Single-Source personalpagerank on OSNs with Rademacher Average)。PANDA采用统计学习理论中的Rademacher Average对随机漫步进行分批迭代采样,动态估计准确率。这种依赖于数据的方法允许在满足所需精度后尽早终止。此外,PANDA还具有动态采样计划以优化效率。实证研究表明,PANDA明显优于现有的方法,以更高的效率达到相同的精度。
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.