DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters

Yinggen Xu, Liu Liu, Zhijun Ding
{"title":"DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters","authors":"Yinggen Xu, Liu Liu, Zhijun Ding","doi":"10.1109/IPDPS47924.2020.00047","DOIUrl":null,"url":null,"abstract":"Data dependency, often presented as directed acyclic graph (DAG), is a crucial application semantics for the performance of data analytic platforms such as Spark. Spark comes with two built-in schedulers, namely FIFO and Fair scheduler, which do not take advantage of data dependency structures. Recently proposed DAG-aware task scheduling approaches, notably GRAPHENE, have achieved significant performance improvements but paid little attention to cache management. The resulted data access patterns interact poorly with the built-in LRU caching, leading to significant cache misses and performance degradation. On the other hand, DAG-aware caching schemes, such as Most Reference Distance (MRD), are designed for FIFO scheduler instead of DAG-aware task schedulers.In this paper, we propose and develop a middleware Dagon, which leverages the complexity and heterogeneity of DAGs to jointly execute task scheduling and cache management. Dagon relies on three key mechanisms: DAG-aware task assignment that considers dependency structure and heterogeneous resource demands to reduce potential resource fragmentation, sensitivity-aware delay scheduling that prevents executors from long waiting for tasks insensitive to locality, and priority-aware caching that makes the cache eviction and prefetching decisions based on the stage priority determined by DAG-aware task assignment. We have implemented Dagon in Apache Spark. Evaluation on a testbed shows that Dagon improves the job completion time by up to 42% and CPU utilization by up to 46% respectively, compared to GRAPHENE plus MRD.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"16 1","pages":"378-387"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Data dependency, often presented as directed acyclic graph (DAG), is a crucial application semantics for the performance of data analytic platforms such as Spark. Spark comes with two built-in schedulers, namely FIFO and Fair scheduler, which do not take advantage of data dependency structures. Recently proposed DAG-aware task scheduling approaches, notably GRAPHENE, have achieved significant performance improvements but paid little attention to cache management. The resulted data access patterns interact poorly with the built-in LRU caching, leading to significant cache misses and performance degradation. On the other hand, DAG-aware caching schemes, such as Most Reference Distance (MRD), are designed for FIFO scheduler instead of DAG-aware task schedulers.In this paper, we propose and develop a middleware Dagon, which leverages the complexity and heterogeneity of DAGs to jointly execute task scheduling and cache management. Dagon relies on three key mechanisms: DAG-aware task assignment that considers dependency structure and heterogeneous resource demands to reduce potential resource fragmentation, sensitivity-aware delay scheduling that prevents executors from long waiting for tasks insensitive to locality, and priority-aware caching that makes the cache eviction and prefetching decisions based on the stage priority determined by DAG-aware task assignment. We have implemented Dagon in Apache Spark. Evaluation on a testbed shows that Dagon improves the job completion time by up to 42% and CPU utilization by up to 46% respectively, compared to GRAPHENE plus MRD.
基于dag的Spark集群联合任务调度与缓存管理
数据依赖关系,通常表现为有向无环图(DAG),是数据分析平台(如Spark)性能的关键应用语义。Spark带有两个内置调度器,即FIFO和Fair调度器,它们不利用数据依赖结构。最近提出的dag感知任务调度方法,特别是石墨烯,已经取得了显著的性能改进,但很少关注缓存管理。由此产生的数据访问模式与内置LRU缓存交互不良,导致严重的缓存丢失和性能下降。另一方面,支持dag的缓存方案,如大多数引用距离(MRD),是为FIFO调度器而不是支持dag的任务调度器设计的。本文提出并开发了一种中间件Dagon,利用dag的复杂性和异构性,共同执行任务调度和缓存管理。Dagon依赖于三个关键机制:考虑依赖关系结构和异构资源需求的dag感知任务分配,以减少潜在的资源碎片;敏感性感知延迟调度,防止执行器长时间等待对位置不敏感的任务;优先级感知缓存,根据dag感知任务分配确定的阶段优先级,做出缓存提取和预取决策。我们已经在Apache Spark中实现了Dagon。测试平台上的评估表明,与石墨烯+ MRD相比,Dagon的作业完成时间和CPU利用率分别提高了42%和46%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信