DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI:10.1109/IPDPS47924.2020.00047

Yinggen Xu, Liu Liu, Zhijun Ding

{"title":"DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters","authors":"Yinggen Xu, Liu Liu, Zhijun Ding","doi":"10.1109/IPDPS47924.2020.00047","DOIUrl":null,"url":null,"abstract":"Data dependency, often presented as directed acyclic graph (DAG), is a crucial application semantics for the performance of data analytic platforms such as Spark. Spark comes with two built-in schedulers, namely FIFO and Fair scheduler, which do not take advantage of data dependency structures. Recently proposed DAG-aware task scheduling approaches, notably GRAPHENE, have achieved significant performance improvements but paid little attention to cache management. The resulted data access patterns interact poorly with the built-in LRU caching, leading to significant cache misses and performance degradation. On the other hand, DAG-aware caching schemes, such as Most Reference Distance (MRD), are designed for FIFO scheduler instead of DAG-aware task schedulers.In this paper, we propose and develop a middleware Dagon, which leverages the complexity and heterogeneity of DAGs to jointly execute task scheduling and cache management. Dagon relies on three key mechanisms: DAG-aware task assignment that considers dependency structure and heterogeneous resource demands to reduce potential resource fragmentation, sensitivity-aware delay scheduling that prevents executors from long waiting for tasks insensitive to locality, and priority-aware caching that makes the cache eviction and prefetching decisions based on the stage priority determined by DAG-aware task assignment. We have implemented Dagon in Apache Spark. Evaluation on a testbed shows that Dagon improves the job completion time by up to 42% and CPU utilization by up to 46% respectively, compared to GRAPHENE plus MRD.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"16 1","pages":"378-387"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Data dependency, often presented as directed acyclic graph (DAG), is a crucial application semantics for the performance of data analytic platforms such as Spark. Spark comes with two built-in schedulers, namely FIFO and Fair scheduler, which do not take advantage of data dependency structures. Recently proposed DAG-aware task scheduling approaches, notably GRAPHENE, have achieved significant performance improvements but paid little attention to cache management. The resulted data access patterns interact poorly with the built-in LRU caching, leading to significant cache misses and performance degradation. On the other hand, DAG-aware caching schemes, such as Most Reference Distance (MRD), are designed for FIFO scheduler instead of DAG-aware task schedulers.In this paper, we propose and develop a middleware Dagon, which leverages the complexity and heterogeneity of DAGs to jointly execute task scheduling and cache management. Dagon relies on three key mechanisms: DAG-aware task assignment that considers dependency structure and heterogeneous resource demands to reduce potential resource fragmentation, sensitivity-aware delay scheduling that prevents executors from long waiting for tasks insensitive to locality, and priority-aware caching that makes the cache eviction and prefetching decisions based on the stage priority determined by DAG-aware task assignment. We have implemented Dagon in Apache Spark. Evaluation on a testbed shows that Dagon improves the job completion time by up to 42% and CPU utilization by up to 46% respectively, compared to GRAPHENE plus MRD.

查看原文本刊更多论文

基于dag的Spark集群联合任务调度与缓存管理

数据依赖关系，通常表现为有向无环图(DAG)，是数据分析平台(如Spark)性能的关键应用语义。Spark带有两个内置调度器，即FIFO和Fair调度器，它们不利用数据依赖结构。最近提出的dag感知任务调度方法，特别是石墨烯，已经取得了显著的性能改进，但很少关注缓存管理。由此产生的数据访问模式与内置LRU缓存交互不良，导致严重的缓存丢失和性能下降。另一方面，支持dag的缓存方案，如大多数引用距离(MRD)，是为FIFO调度器而不是支持dag的任务调度器设计的。本文提出并开发了一种中间件Dagon，利用dag的复杂性和异构性，共同执行任务调度和缓存管理。Dagon依赖于三个关键机制:考虑依赖关系结构和异构资源需求的dag感知任务分配，以减少潜在的资源碎片;敏感性感知延迟调度，防止执行器长时间等待对位置不敏感的任务;优先级感知缓存，根据dag感知任务分配确定的阶段优先级，做出缓存提取和预取决策。我们已经在Apache Spark中实现了Dagon。测试平台上的评估表明，与石墨烯+ MRD相比，Dagon的作业完成时间和CPU利用率分别提高了42%和46%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量