Dynamic interaction graphs with probabilistic edge decay

2015 IEEE 31st International Conference on Data Engineering Pub Date : 2015-10-23 DOI:10.1109/ICDE.2015.7113363

Wenlei Xie, Yuanyuan Tian, Yannis Sismanis, Andrey Balmin, P. Haas

{"title":"Dynamic interaction graphs with probabilistic edge decay","authors":"Wenlei Xie, Yuanyuan Tian, Yannis Sismanis, Andrey Balmin, P. Haas","doi":"10.1109/ICDE.2015.7113363","DOIUrl":null,"url":null,"abstract":"A large scale network of social interactions, such as mentions in Twitter, can often be modeled as a “dynamic interaction graph” in which new interactions (edges) are continually added over time. Existing systems for extracting timely insights from such graphs are based on either a cumulative “snapshot” model or a “sliding window” model. The former model does not sufficiently emphasize recent interactions. The latter model abruptly forgets past interactions, leading to discontinuities in which, e.g., the graph analysis completely ignores historically important influencers who have temporarily gone dormant. We introduce TIDE, a distributed system for analyzing dynamic graphs that employs a new “probabilistic edge decay” (PED) model. In this model, the graph analysis algorithm of interest is applied at each time step to one or more graphs obtained as samples from the current “snapshot” graph that comprises all interactions that have occurred so far. The probability that a given edge of the snapshot graph is included in a sample decays over time according to a user specified decay function. The PED model allows controlled trade-offs between recency and continuity, and allows existing analysis algorithms for static graphs to be applied to dynamic graphs essentially without change. For the important class of exponential decay functions, we provide efficient methods that leverage past samples to incrementally generate new samples as time advances. We also exploit the large degree of overlap between samples to reduce memory consumption from O(N) to O(logN) when maintaining N sample graphs. Finally, we provide bulk-execution methods for applying graph algorithms to multiple sample graphs simultaneously without requiring any changes to existing graph-processing APIs. Experiments on a real Twitter dataset demonstrate the effectiveness and efficiency of our TIDE prototype, which is built on top of the Spark distributed computing framework.","PeriodicalId":348359,"journal":{"name":"2015 IEEE 31st International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 31st International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2015.7113363","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

A large scale network of social interactions, such as mentions in Twitter, can often be modeled as a “dynamic interaction graph” in which new interactions (edges) are continually added over time. Existing systems for extracting timely insights from such graphs are based on either a cumulative “snapshot” model or a “sliding window” model. The former model does not sufficiently emphasize recent interactions. The latter model abruptly forgets past interactions, leading to discontinuities in which, e.g., the graph analysis completely ignores historically important influencers who have temporarily gone dormant. We introduce TIDE, a distributed system for analyzing dynamic graphs that employs a new “probabilistic edge decay” (PED) model. In this model, the graph analysis algorithm of interest is applied at each time step to one or more graphs obtained as samples from the current “snapshot” graph that comprises all interactions that have occurred so far. The probability that a given edge of the snapshot graph is included in a sample decays over time according to a user specified decay function. The PED model allows controlled trade-offs between recency and continuity, and allows existing analysis algorithms for static graphs to be applied to dynamic graphs essentially without change. For the important class of exponential decay functions, we provide efficient methods that leverage past samples to incrementally generate new samples as time advances. We also exploit the large degree of overlap between samples to reduce memory consumption from O(N) to O(logN) when maintaining N sample graphs. Finally, we provide bulk-execution methods for applying graph algorithms to multiple sample graphs simultaneously without requiring any changes to existing graph-processing APIs. Experiments on a real Twitter dataset demonstrate the effectiveness and efficiency of our TIDE prototype, which is built on top of the Spark distributed computing framework.

查看原文本刊更多论文

具有概率边衰减的动态交互图

一个大规模的社会互动网络，比如Twitter上的提及，通常可以被建模为一个“动态互动图”，其中新的互动(边)会随着时间的推移而不断增加。现有的从这些图表中提取及时见解的系统要么是基于累积的“快照”模型，要么是基于“滑动窗口”模型。前一种模型没有充分强调最近的相互作用。后一种模型突然忘记了过去的相互作用，导致不连续性，例如，图表分析完全忽略了历史上重要的影响者，他们暂时处于休眠状态。我们介绍了TIDE，一个用于分析动态图的分布式系统，它采用了一种新的“概率边缘衰减”(PED)模型。在此模型中，感兴趣的图分析算法在每个时间步上应用于从包含迄今为止发生的所有交互的当前“快照”图中作为样本获得的一个或多个图。快照图的给定边缘包含在样本中的概率根据用户指定的衰减函数随时间衰减。PED模型允许在近时性和连续性之间进行可控的权衡，并允许将现有的静态图分析算法基本无需更改地应用于动态图。对于一类重要的指数衰减函数，我们提供了有效的方法，利用过去的样本随着时间的推移逐渐生成新的样本。当维护N个样本图时，我们还利用样本之间的大程度重叠将内存消耗从O(N)减少到O(logN)。最后，我们提供了批量执行方法，可以同时将图形算法应用于多个示例图形，而不需要对现有的图形处理api进行任何更改。在真实Twitter数据集上的实验证明了基于Spark分布式计算框架构建的TIDE原型的有效性和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 31st International Conference on Data Engineering

自引率

0.00%

发文量