STRIP: stream learning of influence probabilities

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2487657

Konstantin Kutzkov, A. Bifet, F. Bonchi, A. Gionis

{"title":"STRIP: stream learning of influence probabilities","authors":"Konstantin Kutzkov, A. Bifet, F. Bonchi, A. Gionis","doi":"10.1145/2487575.2487657","DOIUrl":null,"url":null,"abstract":"Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

Abstract

Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory.

查看原文本刊更多论文

STRIP:影响概率的流学习

影响驱动的信息扩散是社会网络的一个基本过程。学习这一过程的潜在变量，即每个环节的影响强度，是理解复杂网络的结构和功能、建模信息级联以及开发病毒式营销等应用程序的核心问题。受现代微博平台(如twitter)的启发，本文研究了数据流场景下的学习影响概率问题，在这种场景下，网络拓扑结构相对稳定，学习算法的挑战是使用少量的时间和内存来跟上连续的tweet流。我们的贡献是一些随机逼近算法，根据可用空间(节点数量n中的超线性、线性和亚线性)和不同的模型(地标和滑动窗口)进行分类。在几个结果中，我们表明，在地标模型和滑动窗口模型中，我们可以使用O(nlog n)空间通过一次数据来学习影响概率，并且我们进一步表明，我们的算法在最优的对数因子范围内。对于真正的大图，当需要在次线性空间中操作时，我们表明，假设我们将注意力限制在最活跃的用户上，我们仍然可以在一次传递中学习影响概率。我们对大型社交图谱的实验评估表明，我们的算法的经验表现与理论预测一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量