{"title":"使用工作量建模的分布式流处理工作流的主动扩展:博士研讨会","authors":"Thomas Cooper","doi":"10.1145/2933267.2933429","DOIUrl":null,"url":null,"abstract":"In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium\",\"authors\":\"Thomas Cooper\",\"doi\":\"10.1145/2933267.2933429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.\",\"PeriodicalId\":277061,\"journal\":{\"name\":\"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2933267.2933429\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933267.2933429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium
In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.