Generation of shortest path search dataflow networks of actors for parallel multi-core implementation

IF 3.4 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
A. Prihozhy
{"title":"Generation of shortest path search dataflow networks of actors for parallel multi-core implementation","authors":"A. Prihozhy","doi":"10.37661/1816-0301-2023-20-2-65-84","DOIUrl":null,"url":null,"abstract":"Objectives. The problem of parallelizing computations on multicore systems is considered. On the Floyd – Warshall blocked algorithm of shortest paths search in dense graphs of large size, two types of parallelism are compared: fork-join and network dataflow. Using the CAL programming language, a method of developing actors and an algorithm of generating parallel dataflow networks are proposed. The objective is to improve performance of parallel implementations of algorithms which have the property of partial order of computations on multicore processors.Methods. Methods of graph theory, algorithm theory, parallelization theory and formal language theory are used.Results. Claims about the possibility of reordering calculations in the blocked Floyd – Warshall algorithm are proved, which make it possible to achieve a greater load of cores during algorithm execution. Based on the claims, a method of constructing actors in the CAL language is developed and an algorithm for automatic generation of dataflow CAL networks for various configurations of block matrices describing the lengths of the shortest paths is proposed. It is proved that the networks have the properties of rate consistency, boundedness, and liveness. In actors running in parallel, the order of execution of actions with asynchronous behavior can change dynamically, resulting in efficient use of caches and increased core load. To implement the new features of actors, networks and the method of their generation, a tunable multi-threaded CAL engine has been developed that implements a static dataflow model of computation with bounded sizes of buffers. From the experimental results obtained on four types of multi-core processors it follows that there is an optimal size of the network matrix of actors for which the performance is maximum, and the size depends on the number of cores and the size of graph.Conclusion. It has been shown that dataflow networks of actors are an effective means to parallelize computationally intensive algorithms that describe a partial order of computations over decomposed data. The results obtained on the blocked algorithm of shortest paths search prove that the parallelism of dataflow networks gives higher performance of software implementations on multicore processors in comparison with the fork-join parallelism of OpenMP.","PeriodicalId":37100,"journal":{"name":"Informatics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37661/1816-0301-2023-20-2-65-84","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives. The problem of parallelizing computations on multicore systems is considered. On the Floyd – Warshall blocked algorithm of shortest paths search in dense graphs of large size, two types of parallelism are compared: fork-join and network dataflow. Using the CAL programming language, a method of developing actors and an algorithm of generating parallel dataflow networks are proposed. The objective is to improve performance of parallel implementations of algorithms which have the property of partial order of computations on multicore processors.Methods. Methods of graph theory, algorithm theory, parallelization theory and formal language theory are used.Results. Claims about the possibility of reordering calculations in the blocked Floyd – Warshall algorithm are proved, which make it possible to achieve a greater load of cores during algorithm execution. Based on the claims, a method of constructing actors in the CAL language is developed and an algorithm for automatic generation of dataflow CAL networks for various configurations of block matrices describing the lengths of the shortest paths is proposed. It is proved that the networks have the properties of rate consistency, boundedness, and liveness. In actors running in parallel, the order of execution of actions with asynchronous behavior can change dynamically, resulting in efficient use of caches and increased core load. To implement the new features of actors, networks and the method of their generation, a tunable multi-threaded CAL engine has been developed that implements a static dataflow model of computation with bounded sizes of buffers. From the experimental results obtained on four types of multi-core processors it follows that there is an optimal size of the network matrix of actors for which the performance is maximum, and the size depends on the number of cores and the size of graph.Conclusion. It has been shown that dataflow networks of actors are an effective means to parallelize computationally intensive algorithms that describe a partial order of computations over decomposed data. The results obtained on the blocked algorithm of shortest paths search prove that the parallelism of dataflow networks gives higher performance of software implementations on multicore processors in comparison with the fork-join parallelism of OpenMP.
并行多核实现中参与者最短路径搜索数据流网络的生成
目标。研究了多核系统上的并行计算问题。在大尺寸密集图中最短路径搜索的Floyd–Warshall阻塞算法上,比较了两种类型的并行性:fork-join和网络数据流。使用CAL编程语言,提出了一种开发行动者的方法和生成并行数据流网络的算法。目标是提高算法的并行实现的性能,这些算法在多核处理器上具有计算偏序的性质。方法。使用了图论、算法论、并行化理论和形式语言理论的方法。后果证明了在阻塞的Floyd–Warshall算法中重新排序计算的可能性,这使得在算法执行期间实现更大的核心负载成为可能。基于权利要求,开发了一种用CAL语言构建参与者的方法,并提出了一种用于自动生成数据流CAL网络的算法,该算法适用于描述最短路径长度的块矩阵的各种配置。证明了网络具有速率一致性、有界性和活跃性。在并行运行的参与者中,具有异步行为的操作的执行顺序可以动态变化,从而有效地使用缓存并增加核心负载。为了实现行动者、网络及其生成方法的新功能,开发了一个可调的多线程CAL引擎,该引擎实现了具有有限缓冲区大小的静态计算数据流模型。从在四种类型的多核处理器上获得的实验结果来看,存在性能最大的参与者网络矩阵的最优大小,并且该大小取决于核的数量和图的大小。结论已经表明,参与者的数据流网络是并行化计算密集型算法的有效手段,这些算法描述了分解数据上的部分计算顺序。在最短路径搜索的阻塞算法上获得的结果证明,与OpenMP的fork-join并行性相比,数据流网络的并行性在多核处理器上提供了更高的软件实现性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Informatics
Informatics Social Sciences-Communication
CiteScore
6.60
自引率
6.50%
发文量
88
审稿时长
6 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信