Rafael J. N. Silva, Brunno F. Goldstein, Leandro Santiago, A. Sena, L. A. J. Marzulo, Tiago A. O. Alves, F. França
{"title":"Sucuri数据流库中的任务调度","authors":"Rafael J. N. Silva, Brunno F. Goldstein, Leandro Santiago, A. Sena, L. A. J. Marzulo, Tiago A. O. Alves, F. França","doi":"10.1109/SBAC-PADW.2016.15","DOIUrl":null,"url":null,"abstract":"Sucuri is a minimalistic Python library that provides dataflow programming through a reasonably simple syntax. It allows transparent execution on computer clusters and natural exploitation of parallelism. In Sucuri, programmers instantiate a dataflow graph, where each node is assigned to a function and edges represent data dependencies between nodes. The original implementation of Sucuri adopts a centralized scheduler, which incurs high communication overheads, specially in clusters with a large number of machines. In this paper we modify Sucuri so that each machine in a cluster will have its own scheduler. Before execution, the dataflow graph is partitioned, so that nodes can be distributed among the machines of the cluster. In runtime, idle workers will grab tasks from a ready queue in their local scheduler. Experimental results confirm that the solution can reduce communication overheads, improving performance in larger clusters.","PeriodicalId":186179,"journal":{"name":"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Task Scheduling in Sucuri Dataflow Library\",\"authors\":\"Rafael J. N. Silva, Brunno F. Goldstein, Leandro Santiago, A. Sena, L. A. J. Marzulo, Tiago A. O. Alves, F. França\",\"doi\":\"10.1109/SBAC-PADW.2016.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sucuri is a minimalistic Python library that provides dataflow programming through a reasonably simple syntax. It allows transparent execution on computer clusters and natural exploitation of parallelism. In Sucuri, programmers instantiate a dataflow graph, where each node is assigned to a function and edges represent data dependencies between nodes. The original implementation of Sucuri adopts a centralized scheduler, which incurs high communication overheads, specially in clusters with a large number of machines. In this paper we modify Sucuri so that each machine in a cluster will have its own scheduler. Before execution, the dataflow graph is partitioned, so that nodes can be distributed among the machines of the cluster. In runtime, idle workers will grab tasks from a ready queue in their local scheduler. Experimental results confirm that the solution can reduce communication overheads, improving performance in larger clusters.\",\"PeriodicalId\":186179,\"journal\":{\"name\":\"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PADW.2016.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PADW.2016.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sucuri is a minimalistic Python library that provides dataflow programming through a reasonably simple syntax. It allows transparent execution on computer clusters and natural exploitation of parallelism. In Sucuri, programmers instantiate a dataflow graph, where each node is assigned to a function and edges represent data dependencies between nodes. The original implementation of Sucuri adopts a centralized scheduler, which incurs high communication overheads, specially in clusters with a large number of machines. In this paper we modify Sucuri so that each machine in a cluster will have its own scheduler. Before execution, the dataflow graph is partitioned, so that nodes can be distributed among the machines of the cluster. In runtime, idle workers will grab tasks from a ready queue in their local scheduler. Experimental results confirm that the solution can reduce communication overheads, improving performance in larger clusters.