数据驱动的Spark流系统优先级调度

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI:10.1109/CCGRID.2019.00072

Tobi Ajila, S. Majumdar

{"title":"数据驱动的Spark流系统优先级调度","authors":"Tobi Ajila, S. Majumdar","doi":"10.1109/CCGRID.2019.00072","DOIUrl":null,"url":null,"abstract":"Big data has become essential for businesses as it enables companies and organizations to gather insights from their data and use it to determine marketing opportunities, assist decision-making or even to find new business opportunities. Companies spend a great deal of effort collecting large amounts of data, which in some cases must be processed in real-time in order to capitalize on business opportunities. Predicting the expected input load at a given point in time can be very difficult and sometimes impossible. As a result, a great deal of effort is put into creating techniques to address varying input loads. A widely used approach is dynamic resource provisioning, but resource provisioners may not react in time to address the resource shortage which can result in increased processing latencies. This paper presents a priority scheduling technique that can be used in conjunction with dynamic and static resource provisioning. This approach allows users to assign a priority to input data items. The scheduler ensures that higher priority data items are given precedence over lower priority data items. This means that when resources become constrained the higher priority data items receive a greater share of resources and experience lower queueing delays in comparison to low priority items. A prototype for the data driven priority scheduler is implemented on the Spark Streaming system.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Data Driven Priority Scheduling on a Spark Streaming System\",\"authors\":\"Tobi Ajila, S. Majumdar\",\"doi\":\"10.1109/CCGRID.2019.00072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data has become essential for businesses as it enables companies and organizations to gather insights from their data and use it to determine marketing opportunities, assist decision-making or even to find new business opportunities. Companies spend a great deal of effort collecting large amounts of data, which in some cases must be processed in real-time in order to capitalize on business opportunities. Predicting the expected input load at a given point in time can be very difficult and sometimes impossible. As a result, a great deal of effort is put into creating techniques to address varying input loads. A widely used approach is dynamic resource provisioning, but resource provisioners may not react in time to address the resource shortage which can result in increased processing latencies. This paper presents a priority scheduling technique that can be used in conjunction with dynamic and static resource provisioning. This approach allows users to assign a priority to input data items. The scheduler ensures that higher priority data items are given precedence over lower priority data items. This means that when resources become constrained the higher priority data items receive a greater share of resources and experience lower queueing delays in comparison to low priority items. A prototype for the data driven priority scheduler is implemented on the Spark Streaming system.\",\"PeriodicalId\":234571,\"journal\":{\"name\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2019.00072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

大数据对企业来说已经变得至关重要，因为它使公司和组织能够从数据中收集见解，并利用它来确定营销机会，协助决策，甚至发现新的商业机会。公司花费大量精力收集大量数据，在某些情况下，为了利用商业机会，必须对这些数据进行实时处理。在给定的时间点预测预期的输入负载是非常困难的，有时甚至是不可能的。因此，需要投入大量精力来创建处理不同输入负载的技术。一种广泛使用的方法是动态资源供应，但是资源供应方可能无法及时响应以解决资源短缺问题，这可能导致处理延迟增加。本文提出了一种优先级调度技术，可以与动态和静态资源配置结合使用。这种方法允许用户为输入数据项分配优先级。调度器确保高优先级的数据项优先于低优先级的数据项。这意味着当资源受到限制时，与低优先级的数据项相比，高优先级的数据项获得更大的资源份额，并且经历更低的排队延迟。在Spark Streaming系统上实现了一个数据驱动优先级调度器的原型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Driven Priority Scheduling on a Spark Streaming System

Big data has become essential for businesses as it enables companies and organizations to gather insights from their data and use it to determine marketing opportunities, assist decision-making or even to find new business opportunities. Companies spend a great deal of effort collecting large amounts of data, which in some cases must be processed in real-time in order to capitalize on business opportunities. Predicting the expected input load at a given point in time can be very difficult and sometimes impossible. As a result, a great deal of effort is put into creating techniques to address varying input loads. A widely used approach is dynamic resource provisioning, but resource provisioners may not react in time to address the resource shortage which can result in increased processing latencies. This paper presents a priority scheduling technique that can be used in conjunction with dynamic and static resource provisioning. This approach allows users to assign a priority to input data items. The scheduler ensures that higher priority data items are given precedence over lower priority data items. This means that when resources become constrained the higher priority data items receive a greater share of resources and experience lower queueing delays in comparison to low priority items. A prototype for the data driven priority scheduler is implemented on the Spark Streaming system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量