分布式执行框架的紧急调度

Paul Dean
{"title":"分布式执行框架的紧急调度","authors":"Paul Dean","doi":"10.1109/FAS-W.2019.00063","DOIUrl":null,"url":null,"abstract":"Distributed execution Frameworks (DEFs) provide a platform for handling the increasing volume of data available to distributed computational processes, forming the creation and usage of a large number of DEFs for performing distributed computations. For example, sorting and analyzing large data sets through map and reduce operations, performing a set of operations across points in a data stream to provide near real-time analysis, and the training and testing of machine learning models for varying methods of learning, such as, supervised, unsupervised and reinforcement learning, exploiting the vast amounts of data available. Leading to varying DEFs becoming optimal for either fine or coarse grained computations, for example Apache Spark provides a framework for coarse grained data parallel processes providing data locality adding latency to scheduling decisions which would hinder performance of fine-grained computation. Whereas Ray and Apache Flink provide solutions to avoid the latency incurred by the scheduling method used by apache Spark while potentially incurring longer job completion times as data locality is no longer a priority. Therefore, this PhD will focus on overcoming the issue of trading performance for differing workloads by exploiting the capabilities presented by emergent software systems which learn how to assemble and re-assemble themselves in response to their current deployment conditions and input pattern. This allows the creation of a component based DEF capable of altering both the local behaviour of a DEF (i.e. Local Schedulers and placement polices within a centralised scheduler) to potentially improve the performance of single DEF as well as global behaviour of a DEF, for example the adaptation of a centralised to two-level scheduler.","PeriodicalId":368308,"journal":{"name":"2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emergent Scheduling of Distributed Execution Frameworks\",\"authors\":\"Paul Dean\",\"doi\":\"10.1109/FAS-W.2019.00063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed execution Frameworks (DEFs) provide a platform for handling the increasing volume of data available to distributed computational processes, forming the creation and usage of a large number of DEFs for performing distributed computations. For example, sorting and analyzing large data sets through map and reduce operations, performing a set of operations across points in a data stream to provide near real-time analysis, and the training and testing of machine learning models for varying methods of learning, such as, supervised, unsupervised and reinforcement learning, exploiting the vast amounts of data available. Leading to varying DEFs becoming optimal for either fine or coarse grained computations, for example Apache Spark provides a framework for coarse grained data parallel processes providing data locality adding latency to scheduling decisions which would hinder performance of fine-grained computation. Whereas Ray and Apache Flink provide solutions to avoid the latency incurred by the scheduling method used by apache Spark while potentially incurring longer job completion times as data locality is no longer a priority. Therefore, this PhD will focus on overcoming the issue of trading performance for differing workloads by exploiting the capabilities presented by emergent software systems which learn how to assemble and re-assemble themselves in response to their current deployment conditions and input pattern. This allows the creation of a component based DEF capable of altering both the local behaviour of a DEF (i.e. Local Schedulers and placement polices within a centralised scheduler) to potentially improve the performance of single DEF as well as global behaviour of a DEF, for example the adaptation of a centralised to two-level scheduler.\",\"PeriodicalId\":368308,\"journal\":{\"name\":\"2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FAS-W.2019.00063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FAS-W.2019.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

分布式执行框架(DEFs)提供了一个平台,用于处理分布式计算过程可用的不断增长的数据量,形成了用于执行分布式计算的大量DEFs的创建和使用。例如,通过map和reduce操作对大型数据集进行排序和分析,在数据流中跨点执行一组操作以提供接近实时的分析,以及针对不同学习方法(如监督学习、无监督学习和强化学习)的机器学习模型的训练和测试,利用大量可用数据。导致不同的DEFs成为细粒度或粗粒度计算的最佳选择,例如,Apache Spark为粗粒度数据并行进程提供了一个框架,提供了数据局部性,增加了调度决策的延迟,这将阻碍细粒度计算的性能。而Ray和Apache Flink提供了解决方案,以避免Apache Spark使用的调度方法所带来的延迟,同时由于数据位置不再是优先级,可能会导致更长的作业完成时间。因此,本博士将专注于通过利用紧急软件系统提供的功能来克服不同工作负载的交易性能问题,这些系统学习如何根据当前的部署条件和输入模式自行组装和重新组装。这允许创建一个基于DEF的组件,该组件能够改变DEF的本地行为(即本地调度器和集中调度器中的放置策略),从而潜在地改善单个DEF的性能以及DEF的全局行为,例如从集中式调度器到两级调度器的适应。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Emergent Scheduling of Distributed Execution Frameworks
Distributed execution Frameworks (DEFs) provide a platform for handling the increasing volume of data available to distributed computational processes, forming the creation and usage of a large number of DEFs for performing distributed computations. For example, sorting and analyzing large data sets through map and reduce operations, performing a set of operations across points in a data stream to provide near real-time analysis, and the training and testing of machine learning models for varying methods of learning, such as, supervised, unsupervised and reinforcement learning, exploiting the vast amounts of data available. Leading to varying DEFs becoming optimal for either fine or coarse grained computations, for example Apache Spark provides a framework for coarse grained data parallel processes providing data locality adding latency to scheduling decisions which would hinder performance of fine-grained computation. Whereas Ray and Apache Flink provide solutions to avoid the latency incurred by the scheduling method used by apache Spark while potentially incurring longer job completion times as data locality is no longer a priority. Therefore, this PhD will focus on overcoming the issue of trading performance for differing workloads by exploiting the capabilities presented by emergent software systems which learn how to assemble and re-assemble themselves in response to their current deployment conditions and input pattern. This allows the creation of a component based DEF capable of altering both the local behaviour of a DEF (i.e. Local Schedulers and placement polices within a centralised scheduler) to potentially improve the performance of single DEF as well as global behaviour of a DEF, for example the adaptation of a centralised to two-level scheduler.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信