M-DFCPP: A runtime library for multi-machine dataflow computing

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2024-08-07 DOI:10.1002/cpe.8248

Qiuming Luo, Senhong Liu, Jinke Huang, Jinrong Li

{"title":"M-DFCPP: A runtime library for multi-machine dataflow computing","authors":"Qiuming Luo, Senhong Liu, Jinke Huang, Jinrong Li","doi":"10.1002/cpe.8248","DOIUrl":null,"url":null,"abstract":"<div>\n \n This article designs and implements a runtime library for general dataflow programming, DFCPP (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145-152.), and builds upon it to design and implement a multi-machine C++ dataflow library, M-DFCPP. In comparison to existing dataflow programming environments, DFCPP features a user-friendly interface and richer expressive capabilities (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145-152.), enabling the representation of various types of dataflow actor tasks (static, dynamic and conditional task). Besides that, DFCPP addresses the memory management and task scheduling for non-uniform memory access architectures, while other dataflow libraries lack attention to these issues. M-DFCPP extends the capability of current dataflow runtime libraries (DFCPP, taskflow, openstream, etc.) and capable of multi-machine computing, while maintains the API compatible with DFCPP. M-DFCPP adopts the concepts of master and follower (Dean J, Ghemawat S. Commun ACM. 2008;51(1):107-113; Ghemawat S, Gobioff H, Leung ST. ACM SIGOPS Operating Systems Review. ACM; 2003:29-43.), which form a worksharing framework as many multi-machine system. To shift to the M-DFCPP framework, a filtering layer is inserted to the original DFCPP, transforming it into followers that can cooperate with each other. The master is made of modules for scheduling, data processing, graph partition, state management and so forth. In benchmark tests with workload with directed acyclic graph topology of binary trees and random graphs, DFCPP demonstrated performance improvements of 20% and 8%, respectively, compared to the second fastest library. M-DFCPP consistently exhibits outstanding performance across varying levels of concurrency and task workloads, achieving a maximum speedup of more than 20 over DFCPP, when the task parallelism exceeds 5000 on 32 nodes. Moreover, M-DFCPP, as a runtime library supporting multi-node dataflow computation, is compared with MPI, a runtime library supporting multi-node control flow computation.\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8248","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

This article designs and implements a runtime library for general dataflow programming, DFCPP (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145-152.), and builds upon it to design and implement a multi-machine C++ dataflow library, M-DFCPP. In comparison to existing dataflow programming environments, DFCPP features a user-friendly interface and richer expressive capabilities (Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops. ACM; 2023:145-152.), enabling the representation of various types of dataflow actor tasks (static, dynamic and conditional task). Besides that, DFCPP addresses the memory management and task scheduling for non-uniform memory access architectures, while other dataflow libraries lack attention to these issues. M-DFCPP extends the capability of current dataflow runtime libraries (DFCPP, taskflow, openstream, etc.) and capable of multi-machine computing, while maintains the API compatible with DFCPP. M-DFCPP adopts the concepts of master and follower (Dean J, Ghemawat S. Commun ACM. 2008;51(1):107-113; Ghemawat S, Gobioff H, Leung ST. ACM SIGOPS Operating Systems Review. ACM; 2003:29-43.), which form a worksharing framework as many multi-machine system. To shift to the M-DFCPP framework, a filtering layer is inserted to the original DFCPP, transforming it into followers that can cooperate with each other. The master is made of modules for scheduling, data processing, graph partition, state management and so forth. In benchmark tests with workload with directed acyclic graph topology of binary trees and random graphs, DFCPP demonstrated performance improvements of 20% and 8%, respectively, compared to the second fastest library. M-DFCPP consistently exhibits outstanding performance across varying levels of concurrency and task workloads, achieving a maximum speedup of more than 20 over DFCPP, when the task parallelism exceeds 5000 on 32 nodes. Moreover, M-DFCPP, as a runtime library supporting multi-node dataflow computation, is compared with MPI, a runtime library supporting multi-node control flow computation.

查看原文本刊更多论文

M-DFCPP：多机数据流计算运行库

摘要本文设计并实现了通用数据流编程的运行时库 DFCPP（Luo Q, Huang J, Li J, Du Z. 第 52 届并行处理国际研讨会论文集。ACM；2023：145-152），并在此基础上设计和实现了多机 C++ 数据流库 M-DFCPP。与现有的数据流编程环境相比，DFCPP 具有友好的用户界面和更丰富的表达能力（Luo Q, Huang J, Li J, Du Z. Proceedings of the 52nd International Conference on Parallel Processing Workshops.ACM; 2023:145-152.），能够表示各种类型的数据流行为任务（静态、动态和条件任务）。除此之外，DFCPP 还解决了非统一内存访问架构下的内存管理和任务调度问题，而其他数据流库则对这些问题缺乏关注。M-DFCPP 扩展了当前数据流运行库（DFCPP、taskflow、openstream 等）的功能，能够支持多机计算，同时保留了与 DFCPP 兼容的 API。M-DFCPP 采用主从概念（Dean J, Ghemawat S. Commun ACM.2008; 51(1):107-113; Ghemawat S, Gobioff H, Leung ST.ACM SIGOPS 操作系统评论》。ACM；2003：29-43。），形成了一个多机系统的工作共享框架。为了转向 M-DFCPP 框架，在原有的 DFCPP 中插入了一个过滤层，将其转化为可以相互合作的跟随者。主控层由调度、数据处理、图分割、状态管理等模块组成。在二叉树有向无环图拓扑和随机图的基准测试中，DFCPP 的性能分别比第二快的库提高了 20% 和 8%。M-DFCPP 在不同并发水平和任务工作量下始终表现出卓越的性能，当 32 个节点上的任务并行度超过 5000 时，M-DFCPP 比 DFCPP 的最大速度提高了 20 多倍。此外，作为支持多节点数据流计算的运行库，M-DFCPP 还与支持多节点控制流计算的运行库 MPI 进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.