大规模实时流数据处理

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) Pub Date : 2019-12-01 DOI:10.1109/PDCAT46702.2019.00020

M. HoseinyFarahabady, A. Jannesari, Wei Bao, Z. Tari, Albert Y. Zomaya

{"title":"大规模实时流数据处理","authors":"M. HoseinyFarahabady, A. Jannesari, Wei Bao, Z. Tari, Albert Y. Zomaya","doi":"10.1109/PDCAT46702.2019.00020","DOIUrl":null,"url":null,"abstract":"A typical scenario in a stream data-flow processing engine is that users submit continues queries in order to receive the computational result once a new stream of data arrives. The focus of the paper is to design a dynamic CPU cap controller for stream data-flow applications with real-time constraints, in which the result of computations must be available within a short time period, specified by the user, once a recent update in the input data occurs. It is common that the stream data-flow processing engine is deployed over a cluster of dedicated or virtualized server nodes, e.g., Cloud or Edge platform, to achieve a faster data processing. However, the attributes of incoming stream data-flow might fluctuate in an irregular way. To effectively cope with such unpredictable conditions, the underlying resource manager needs to be equipped with a dynamic resource provisioning mechanism to ensure the real-time requirements of different applications. The proposed solution uses control theory principals to achieve a good utilization of computing resources and a reduced average response time. The proposed algorithm dynamically adjusts the required quality of service (QoS) in an environment when multiple stream & data-flow processing applications concurrently run with unknown and volatile workloads. Our study confirms that such a unpredictable demand can negatively degrade the system performance, mainly due to adverse interference in the utilization of shared resources. Unlike prior research studies which assumes a static or zero correlation among the performance variability among consolidated applications, we presume the prevalence of shared-resource interference among collocated applications as a key performance-limiting parameter and confront it in scenarios where several applications have different QoS requirements with unpredictable workload demands. We design a low-overhead controller to achieve two natural optimization objectives of minimizing QoS violation amount and maximizing the average CPU utilization. The algorithm takes advantage of design principals in model predictive control theory for elastic allocation of CPU share. The experimental results confirm that there is a strong correlation in performance degradation among consolidation strategies and the system utilization for obtaining the capacity of shared resources in a non-cooperative manner. The results confirm that the proposed solution can reduce the average latency of delay-sensitive applications by 17% comparing to the results of a well established heuristic called Class-Based Weighted Fair Queuing (CFWFQ). At the same time, the proposed solution can prevent the QoS violation incidents by 62%.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-Time Stream Data Processing at Scale\",\"authors\":\"M. HoseinyFarahabady, A. Jannesari, Wei Bao, Z. Tari, Albert Y. Zomaya\",\"doi\":\"10.1109/PDCAT46702.2019.00020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A typical scenario in a stream data-flow processing engine is that users submit continues queries in order to receive the computational result once a new stream of data arrives. The focus of the paper is to design a dynamic CPU cap controller for stream data-flow applications with real-time constraints, in which the result of computations must be available within a short time period, specified by the user, once a recent update in the input data occurs. It is common that the stream data-flow processing engine is deployed over a cluster of dedicated or virtualized server nodes, e.g., Cloud or Edge platform, to achieve a faster data processing. However, the attributes of incoming stream data-flow might fluctuate in an irregular way. To effectively cope with such unpredictable conditions, the underlying resource manager needs to be equipped with a dynamic resource provisioning mechanism to ensure the real-time requirements of different applications. The proposed solution uses control theory principals to achieve a good utilization of computing resources and a reduced average response time. The proposed algorithm dynamically adjusts the required quality of service (QoS) in an environment when multiple stream & data-flow processing applications concurrently run with unknown and volatile workloads. Our study confirms that such a unpredictable demand can negatively degrade the system performance, mainly due to adverse interference in the utilization of shared resources. Unlike prior research studies which assumes a static or zero correlation among the performance variability among consolidated applications, we presume the prevalence of shared-resource interference among collocated applications as a key performance-limiting parameter and confront it in scenarios where several applications have different QoS requirements with unpredictable workload demands. We design a low-overhead controller to achieve two natural optimization objectives of minimizing QoS violation amount and maximizing the average CPU utilization. The algorithm takes advantage of design principals in model predictive control theory for elastic allocation of CPU share. The experimental results confirm that there is a strong correlation in performance degradation among consolidation strategies and the system utilization for obtaining the capacity of shared resources in a non-cooperative manner. The results confirm that the proposed solution can reduce the average latency of delay-sensitive applications by 17% comparing to the results of a well established heuristic called Class-Based Weighted Fair Queuing (CFWFQ). At the same time, the proposed solution can prevent the QoS violation incidents by 62%.\",\"PeriodicalId\":166126,\"journal\":{\"name\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT46702.2019.00020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

流数据流处理引擎中的一个典型场景是，用户提交连续查询，以便在新数据流到达时接收计算结果。本文的重点是为具有实时约束的流数据流应用程序设计一个动态CPU上限控制器，其中一旦输入数据发生最近的更新，计算结果必须在用户指定的短时间内可用。流数据流处理引擎通常部署在专用或虚拟化服务器节点的集群上，例如云或边缘平台，以实现更快的数据处理。但是，传入流数据流的属性可能以不规则的方式波动。为了有效地应对这种不可预测的情况，底层资源管理器需要配备动态的资源供应机制，以确保不同应用程序的实时需求。该解决方案利用控制理论原理，实现了计算资源的有效利用和平均响应时间的缩短。该算法在多个流和数据流处理应用程序并发运行的环境中动态调整服务质量(QoS)。我们的研究证实，这种不可预测的需求会对系统性能产生负面影响，主要是由于对共享资源利用的不利干扰。与先前的研究假设合并应用程序之间的性能变化之间存在静态或零相关性不同，我们假设共享资源干扰在并发应用程序之间的流行是一个关键的性能限制参数，并在几个应用程序具有不同的QoS需求和不可预测的工作负载需求的情况下面对它。我们设计了一个低开销的控制器来实现最小化QoS违例量和最大化平均CPU利用率两个自然优化目标。该算法利用模型预测控制理论的设计原理，实现了CPU共享的弹性分配。实验结果证实，整合策略与系统以非合作方式获取共享资源容量的利用率之间存在很强的性能下降相关性。结果证实，与一种称为基于类的加权公平排队(CFWFQ)的成熟启发式方法的结果相比，所提出的解决方案可以将延迟敏感应用程序的平均延迟减少17%。同时，提出的解决方案可以防止62%的QoS违反事件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-Time Stream Data Processing at Scale

A typical scenario in a stream data-flow processing engine is that users submit continues queries in order to receive the computational result once a new stream of data arrives. The focus of the paper is to design a dynamic CPU cap controller for stream data-flow applications with real-time constraints, in which the result of computations must be available within a short time period, specified by the user, once a recent update in the input data occurs. It is common that the stream data-flow processing engine is deployed over a cluster of dedicated or virtualized server nodes, e.g., Cloud or Edge platform, to achieve a faster data processing. However, the attributes of incoming stream data-flow might fluctuate in an irregular way. To effectively cope with such unpredictable conditions, the underlying resource manager needs to be equipped with a dynamic resource provisioning mechanism to ensure the real-time requirements of different applications. The proposed solution uses control theory principals to achieve a good utilization of computing resources and a reduced average response time. The proposed algorithm dynamically adjusts the required quality of service (QoS) in an environment when multiple stream & data-flow processing applications concurrently run with unknown and volatile workloads. Our study confirms that such a unpredictable demand can negatively degrade the system performance, mainly due to adverse interference in the utilization of shared resources. Unlike prior research studies which assumes a static or zero correlation among the performance variability among consolidated applications, we presume the prevalence of shared-resource interference among collocated applications as a key performance-limiting parameter and confront it in scenarios where several applications have different QoS requirements with unpredictable workload demands. We design a low-overhead controller to achieve two natural optimization objectives of minimizing QoS violation amount and maximizing the average CPU utilization. The algorithm takes advantage of design principals in model predictive control theory for elastic allocation of CPU share. The experimental results confirm that there is a strong correlation in performance degradation among consolidation strategies and the system utilization for obtaining the capacity of shared resources in a non-cooperative manner. The results confirm that the proposed solution can reduce the average latency of delay-sensitive applications by 17% comparing to the results of a well established heuristic called Class-Based Weighted Fair Queuing (CFWFQ). At the same time, the proposed solution can prevent the QoS violation incidents by 62%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

自引率

0.00%

发文量