优化地理分布数据流上的窗口聚合

Hooman Peiro Sajjad, Y. Liu, Vladimir Vlassov
{"title":"优化地理分布数据流上的窗口聚合","authors":"Hooman Peiro Sajjad, Y. Liu, Vladimir Vlassov","doi":"10.1109/EDGE.2018.00012","DOIUrl":null,"url":null,"abstract":"Real-time data analytics is essential since more and more applications require online decision making in a timely manner. However, efficient analysis of geo-distributed data streams is challenging. This is because data needs to be collected from all edge data centers, which aggregate data from local sources, in order to process most of the analytic tasks. Thus, most of the time edge data centers need to transfer data to a central data center over a wide area network, which is expensive. In this paper, we advocate for a coordinated approach of edge data centers in order to handle these analytic tasks efficiently and hence, reducing the communication cost among data centers. We focus on the windowed aggregation of data streams, which has been widely used in stream analytics. In general, aggregation of data streams among edge data centers in the same region reduces the amount of data that needs to be sent over cross-region communication links. Based on state-of-the-art research, we leverage intra-region links and design a low-overhead coordination algorithm that optimizes communication cost for data aggregation. Our algorithm has been evaluated using synthetic and Big Data Benchmark datasets. The evaluation results show that our algorithm reduces the bandwidth cost up to ~6x, as compared to the state-of-the-art solution.","PeriodicalId":396887,"journal":{"name":"2018 IEEE International Conference on Edge Computing (EDGE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Optimizing Windowed Aggregation over Geo-Distributed Data Streams\",\"authors\":\"Hooman Peiro Sajjad, Y. Liu, Vladimir Vlassov\",\"doi\":\"10.1109/EDGE.2018.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-time data analytics is essential since more and more applications require online decision making in a timely manner. However, efficient analysis of geo-distributed data streams is challenging. This is because data needs to be collected from all edge data centers, which aggregate data from local sources, in order to process most of the analytic tasks. Thus, most of the time edge data centers need to transfer data to a central data center over a wide area network, which is expensive. In this paper, we advocate for a coordinated approach of edge data centers in order to handle these analytic tasks efficiently and hence, reducing the communication cost among data centers. We focus on the windowed aggregation of data streams, which has been widely used in stream analytics. In general, aggregation of data streams among edge data centers in the same region reduces the amount of data that needs to be sent over cross-region communication links. Based on state-of-the-art research, we leverage intra-region links and design a low-overhead coordination algorithm that optimizes communication cost for data aggregation. Our algorithm has been evaluated using synthetic and Big Data Benchmark datasets. The evaluation results show that our algorithm reduces the bandwidth cost up to ~6x, as compared to the state-of-the-art solution.\",\"PeriodicalId\":396887,\"journal\":{\"name\":\"2018 IEEE International Conference on Edge Computing (EDGE)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Edge Computing (EDGE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EDGE.2018.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Edge Computing (EDGE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDGE.2018.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

实时数据分析是必不可少的,因为越来越多的应用程序需要及时的在线决策。然而,对地理分布数据流的有效分析是具有挑战性的。这是因为需要从所有边缘数据中心收集数据,这些数据中心聚合来自本地数据源的数据,以便处理大多数分析任务。因此,大多数时候,边缘数据中心需要通过广域网将数据传输到中心数据中心,这是昂贵的。在本文中,我们提倡边缘数据中心的协调方法,以便有效地处理这些分析任务,从而降低数据中心之间的通信成本。我们关注数据流的窗口聚合,这在流分析中已经得到了广泛的应用。通常情况下,在同一区域的边缘数据中心之间聚合数据流可以减少需要通过跨区域通信链路发送的数据量。基于最先进的研究,我们利用区域内链接并设计了一种低开销的协调算法,以优化数据聚合的通信成本。我们的算法已经使用合成和大数据基准数据集进行了评估。评估结果表明,与最先进的解决方案相比,我们的算法将带宽成本降低了约6倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing Windowed Aggregation over Geo-Distributed Data Streams
Real-time data analytics is essential since more and more applications require online decision making in a timely manner. However, efficient analysis of geo-distributed data streams is challenging. This is because data needs to be collected from all edge data centers, which aggregate data from local sources, in order to process most of the analytic tasks. Thus, most of the time edge data centers need to transfer data to a central data center over a wide area network, which is expensive. In this paper, we advocate for a coordinated approach of edge data centers in order to handle these analytic tasks efficiently and hence, reducing the communication cost among data centers. We focus on the windowed aggregation of data streams, which has been widely used in stream analytics. In general, aggregation of data streams among edge data centers in the same region reduces the amount of data that needs to be sent over cross-region communication links. Based on state-of-the-art research, we leverage intra-region links and design a low-overhead coordination algorithm that optimizes communication cost for data aggregation. Our algorithm has been evaluated using synthetic and Big Data Benchmark datasets. The evaluation results show that our algorithm reduces the bandwidth cost up to ~6x, as compared to the state-of-the-art solution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信