{"title":"M-DFCPP: A runtime library for multi-machine dataflow computing","authors":"Qiuming Luo, Senhong Liu, Jinke Huang, Jinrong Li","doi":"10.1002/cpe.8248","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This article designs and implements a runtime library for general dataflow programming, DFCPP (Luo Q, Huang J, Li J, Du Z. <i>Proceedings of the 52nd International Conference on Parallel Processing Workshops</i>. ACM; 2023:145-152.), and builds upon it to design and implement a multi-machine C++ dataflow library, M-DFCPP. In comparison to existing dataflow programming environments, DFCPP features a user-friendly interface and richer expressive capabilities (Luo Q, Huang J, Li J, Du Z. <i>Proceedings of the 52nd International Conference on Parallel Processing Workshops</i>. ACM; 2023:145-152.), enabling the representation of various types of dataflow actor tasks (static, dynamic and conditional task). Besides that, DFCPP addresses the memory management and task scheduling for non-uniform memory access architectures, while other dataflow libraries lack attention to these issues. M-DFCPP extends the capability of current dataflow runtime libraries (DFCPP, taskflow, openstream, etc.) and capable of multi-machine computing, while maintains the API compatible with DFCPP. M-DFCPP adopts the concepts of master and follower (Dean J, Ghemawat S. <i>Commun ACM</i>. 2008;51(1):107-113; Ghemawat S, Gobioff H, Leung ST. <i>ACM SIGOPS Operating Systems Review</i>. ACM; 2003:29-43.), which form a worksharing framework as many multi-machine system. To shift to the M-DFCPP framework, a filtering layer is inserted to the original DFCPP, transforming it into followers that can cooperate with each other. The master is made of modules for scheduling, data processing, graph partition, state management and so forth. In benchmark tests with workload with directed acyclic graph topology of binary trees and random graphs, DFCPP demonstrated performance improvements of 20% and 8%, respectively, compared to the second fastest library. M-DFCPP consistently exhibits outstanding performance across varying levels of concurrency and task workloads, achieving a maximum speedup of more than 20 over DFCPP, when the task parallelism exceeds 5000 on 32 nodes. Moreover, M-DFCPP, as a runtime library supporting multi-node dataflow computation, is compared with MPI, a runtime library supporting multi-node control flow computation.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8248","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
This article designs and implements a runtime library for general dataflow programming, DFCPP (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145-152.), and builds upon it to design and implement a multi-machine C++ dataflow library, M-DFCPP. In comparison to existing dataflow programming environments, DFCPP features a user-friendly interface and richer expressive capabilities (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145-152.), enabling the representation of various types of dataflow actor tasks (static, dynamic and conditional task). Besides that, DFCPP addresses the memory management and task scheduling for non-uniform memory access architectures, while other dataflow libraries lack attention to these issues. M-DFCPP extends the capability of current dataflow runtime libraries (DFCPP, taskflow, openstream, etc.) and capable of multi-machine computing, while maintains the API compatible with DFCPP. M-DFCPP adopts the concepts of master and follower (Dean J, Ghemawat S. Commun ACM. 2008;51(1):107-113; Ghemawat S, Gobioff H, Leung ST. ACM SIGOPS Operating Systems Review. ACM; 2003:29-43.), which form a worksharing framework as many multi-machine system. To shift to the M-DFCPP framework, a filtering layer is inserted to the original DFCPP, transforming it into followers that can cooperate with each other. The master is made of modules for scheduling, data processing, graph partition, state management and so forth. In benchmark tests with workload with directed acyclic graph topology of binary trees and random graphs, DFCPP demonstrated performance improvements of 20% and 8%, respectively, compared to the second fastest library. M-DFCPP consistently exhibits outstanding performance across varying levels of concurrency and task workloads, achieving a maximum speedup of more than 20 over DFCPP, when the task parallelism exceeds 5000 on 32 nodes. Moreover, M-DFCPP, as a runtime library supporting multi-node dataflow computation, is compared with MPI, a runtime library supporting multi-node control flow computation.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.