{"title":"Emergent Scheduling of Distributed Execution Frameworks","authors":"Paul Dean","doi":"10.1109/FAS-W.2019.00063","DOIUrl":null,"url":null,"abstract":"Distributed execution Frameworks (DEFs) provide a platform for handling the increasing volume of data available to distributed computational processes, forming the creation and usage of a large number of DEFs for performing distributed computations. For example, sorting and analyzing large data sets through map and reduce operations, performing a set of operations across points in a data stream to provide near real-time analysis, and the training and testing of machine learning models for varying methods of learning, such as, supervised, unsupervised and reinforcement learning, exploiting the vast amounts of data available. Leading to varying DEFs becoming optimal for either fine or coarse grained computations, for example Apache Spark provides a framework for coarse grained data parallel processes providing data locality adding latency to scheduling decisions which would hinder performance of fine-grained computation. Whereas Ray and Apache Flink provide solutions to avoid the latency incurred by the scheduling method used by apache Spark while potentially incurring longer job completion times as data locality is no longer a priority. Therefore, this PhD will focus on overcoming the issue of trading performance for differing workloads by exploiting the capabilities presented by emergent software systems which learn how to assemble and re-assemble themselves in response to their current deployment conditions and input pattern. This allows the creation of a component based DEF capable of altering both the local behaviour of a DEF (i.e. Local Schedulers and placement polices within a centralised scheduler) to potentially improve the performance of single DEF as well as global behaviour of a DEF, for example the adaptation of a centralised to two-level scheduler.","PeriodicalId":368308,"journal":{"name":"2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FAS-W.2019.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Distributed execution Frameworks (DEFs) provide a platform for handling the increasing volume of data available to distributed computational processes, forming the creation and usage of a large number of DEFs for performing distributed computations. For example, sorting and analyzing large data sets through map and reduce operations, performing a set of operations across points in a data stream to provide near real-time analysis, and the training and testing of machine learning models for varying methods of learning, such as, supervised, unsupervised and reinforcement learning, exploiting the vast amounts of data available. Leading to varying DEFs becoming optimal for either fine or coarse grained computations, for example Apache Spark provides a framework for coarse grained data parallel processes providing data locality adding latency to scheduling decisions which would hinder performance of fine-grained computation. Whereas Ray and Apache Flink provide solutions to avoid the latency incurred by the scheduling method used by apache Spark while potentially incurring longer job completion times as data locality is no longer a priority. Therefore, this PhD will focus on overcoming the issue of trading performance for differing workloads by exploiting the capabilities presented by emergent software systems which learn how to assemble and re-assemble themselves in response to their current deployment conditions and input pattern. This allows the creation of a component based DEF capable of altering both the local behaviour of a DEF (i.e. Local Schedulers and placement polices within a centralised scheduler) to potentially improve the performance of single DEF as well as global behaviour of a DEF, for example the adaptation of a centralised to two-level scheduler.