Iterative Scheduling for Distributed Stream Processing Systems

Leila Eskandari, J. Mair, Zhiyi Huang, D. Eyers
{"title":"Iterative Scheduling for Distributed Stream Processing Systems","authors":"Leila Eskandari, J. Mair, Zhiyi Huang, D. Eyers","doi":"10.1145/3210284.3219768","DOIUrl":null,"url":null,"abstract":"Nowadays data stream processing systems need to efficiently handle large volumes of data in near real-time. To achieve this, the schedulers within such systems minimise the data movement between highly communicating tasks, improving system throughput. However, finding an optimal schedule for these systems is NP-hard. In this research, we propose a heuristic scheduling algorithm which reliably and efficiently finds the highly communicating tasks by exploiting graph partitioning algorithms and a mathematical optimisation software package. We evaluate our scheduler with two popular existing schedulers R-Storm and Aniello et al.'s 'Online scheduler' using two real-world applications and show that our proposed scheduler outperforms R-Storm, increasing throughput by between 3% and 30% and Online scheduler by 20--86% as a result of finding a more efficient schedule.","PeriodicalId":412438,"journal":{"name":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3210284.3219768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Nowadays data stream processing systems need to efficiently handle large volumes of data in near real-time. To achieve this, the schedulers within such systems minimise the data movement between highly communicating tasks, improving system throughput. However, finding an optimal schedule for these systems is NP-hard. In this research, we propose a heuristic scheduling algorithm which reliably and efficiently finds the highly communicating tasks by exploiting graph partitioning algorithms and a mathematical optimisation software package. We evaluate our scheduler with two popular existing schedulers R-Storm and Aniello et al.'s 'Online scheduler' using two real-world applications and show that our proposed scheduler outperforms R-Storm, increasing throughput by between 3% and 30% and Online scheduler by 20--86% as a result of finding a more efficient schedule.
分布式流处理系统的迭代调度
目前,数据流处理系统需要在接近实时的情况下高效地处理大量数据。为了实现这一点,这些系统中的调度器将高度通信任务之间的数据移动最小化,从而提高系统吞吐量。然而,为这些系统找到最优调度是np困难的。在本研究中,我们提出了一种启发式调度算法,该算法利用图划分算法和数学优化软件包可靠有效地找到高通信任务。我们使用两个流行的现有调度器R-Storm和Aniello等人的“在线调度器”来评估我们的调度器,并使用两个真实世界的应用程序,结果表明我们提出的调度器优于R-Storm,由于找到了更有效的调度,我们的调度器将吞吐量提高了3%到30%,在线调度器提高了20%到86%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信