Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems Pub Date : 2016-06-13 DOI:10.1145/2933267.2933429

Thomas Cooper

{"title":"Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium","authors":"Thomas Cooper","doi":"10.1145/2933267.2933429","DOIUrl":null,"url":null,"abstract":"In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933267.2933429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.

查看原文本刊更多论文

使用工作量建模的分布式流处理工作流的主动扩展:博士研讨会

近年来，分布式流处理系统(DSPS)如Apache Storm、Spark和Flink等领域有了显著的发展。这些系统允许对流数据的复杂查询分布在集群中的多个工作节点上。dsp通常提供工具来添加/删除资源，并利用云基础设施来扩展其操作。但是，这背后的决策通常留给这些系统的管理员。已经有一些研究集中在寻找跨集群的DSPS运营商的最佳运营商部署。然而，这些系统通常不会针对给定的服务水平协议(SLA)进行优化，即使进行了优化，也不会考虑传入的工作负载。据我们所知，为了维护sla，很少或根本没有针对传入工作负载主动扩展dsp的工作。本博士将专注于使用时间序列分析预测传入的工作负载。为了评估给定的预测工作负载是否会违反SLA，将使用排队论方法对DSPS工作流对传入工作负载的响应进行建模。目的是构建一个系统，该系统可以在dsp运行时使用端到端延迟和吞吐量等输出指标来调优这个排队理论模型的参数。最终的结果将是一个系统，它可以在潜在的SLA违规发生之前识别出来，并启动一个主动的扩展响应。最初，Apache Storm将被用作测试dsp，但预计在本博士期间开发的系统将适用于其他使用基于图形的流工作流程描述的dsp，例如Apache Spark和Flink。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems

自引率

0.00%

发文量