Real-time analysis of social networks leveraging the flink framework

G. Marciani, M. Piu, M. Porretta, Matteo Nardelli, V. Cardellini
{"title":"Real-time analysis of social networks leveraging the flink framework","authors":"G. Marciani, M. Piu, M. Porretta, Matteo Nardelli, V. Cardellini","doi":"10.1145/2933267.2933517","DOIUrl":null,"url":null,"abstract":"In this paper, we present a solution to the DEBS 2016 Grand Challenge that leverages Apache Flink, an open source platform for distributed stream and batch processing. We design the system architecture focusing on the exploitation of parallelism and memory efficiency so to enable an effective processing of high volume data streams on a distributed infrastructure. Our solution to the first query relies on a distributed and fine-grain approach for updating the post scores and determining partial ranks, which are then merged into a single final rank. Furthermore, changes in the final rank are identified so to update the output only if needed. The second query efficiently represents in-memory the evolving social graph and uses a customized Bron-Kerbosch algorithm to identify the largest communities active on a topic. We leverage on an in-memory caching system to keep the largest connected components which have been previously identified by the algorithm, thus saving computational time. The experimental results show that, on a portion of the dataset large half that provided for the Grand Challenge, our system can process up to 400 tuples/s with an average latency of 2.5 ms for the first query, and up to 370 tuples/s with an average latency of 2.7 ms for the second query.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933267.2933517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, we present a solution to the DEBS 2016 Grand Challenge that leverages Apache Flink, an open source platform for distributed stream and batch processing. We design the system architecture focusing on the exploitation of parallelism and memory efficiency so to enable an effective processing of high volume data streams on a distributed infrastructure. Our solution to the first query relies on a distributed and fine-grain approach for updating the post scores and determining partial ranks, which are then merged into a single final rank. Furthermore, changes in the final rank are identified so to update the output only if needed. The second query efficiently represents in-memory the evolving social graph and uses a customized Bron-Kerbosch algorithm to identify the largest communities active on a topic. We leverage on an in-memory caching system to keep the largest connected components which have been previously identified by the algorithm, thus saving computational time. The experimental results show that, on a portion of the dataset large half that provided for the Grand Challenge, our system can process up to 400 tuples/s with an average latency of 2.5 ms for the first query, and up to 370 tuples/s with an average latency of 2.7 ms for the second query.
利用flink框架实时分析社交网络
在本文中,我们提出了一个利用Apache Flink(分布式流和批处理的开源平台)的DEBS 2016大挑战的解决方案。我们设计的系统架构侧重于利用并行性和内存效率,以便在分布式基础设施上有效处理大容量数据流。我们对第一个查询的解决方案依赖于一种分布式和细粒度的方法来更新帖子分数和确定部分排名,然后将其合并为单个最终排名。此外,还确定了最终排名的变化,以便仅在需要时更新输出。第二个查询有效地在内存中表示不断发展的社交图,并使用自定义的brown - kerbosch算法来识别在某个主题上活跃的最大社区。我们利用内存缓存系统来保留算法先前识别的最大连接组件,从而节省计算时间。实验结果表明,在为大挑战提供的数据集的一半大的部分上,我们的系统可以在第一次查询中处理多达400个元组/s,平均延迟为2.5 ms;在第二次查询中处理多达370个元组/s,平均延迟为2.7 ms。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信