自动发现并行化任务图中的集体通信模式

IF 0.9 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS
Fabian Knorr, Philip Salzmann, Peter Thoman, Thomas Fahringer
{"title":"自动发现并行化任务图中的集体通信模式","authors":"Fabian Knorr, Philip Salzmann, Peter Thoman, Thomas Fahringer","doi":"10.1007/s10766-024-00767-y","DOIUrl":null,"url":null,"abstract":"<p>Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"18 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs\",\"authors\":\"Fabian Knorr, Philip Salzmann, Peter Thoman, Thomas Fahringer\",\"doi\":\"10.1007/s10766-024-00767-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.</p>\",\"PeriodicalId\":14313,\"journal\":{\"name\":\"International Journal of Parallel Programming\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Parallel Programming\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10766-024-00767-y\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Programming","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10766-024-00767-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

集体通信应用程序接口为 MPI 厂商提供了必要的环境,以便根据理论复杂性模型和相关互连的特性优化整个集群的操作。注重可编程性的现代高性能计算运行时系统可以执行依赖性分析,从而完全消除手动通信的需要。在这种情况下,要从优化的集体例程中获益,往往需要对隐含的点对点通信模式进行全局分析,或对内核中允许的数据访问模式进行严格限制。Celerity 应用程序接口(API)通过用户定义的范围映射器(range-mapper)函数,将透明的工作分配与数据访问模式联系起来,从而为运行时实现者和应用程序开发者提供了高度的自由度。从规范上讲,数据依赖性是通过节点内一致性模型和节点间点对点通信来解决的。本文介绍了集体模式发现(CPD),这是一种在并行化任务图上检测集体通信模式的完全分布式、无需协调的方法。通过广泛的调度和通信微基准测试以及计算密集型应用的强扩展实验,我们证明了 CPD 可以在 Celerity 模型中实现大幅性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs

Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs

Collective communication APIs equip MPI vendors with the necessary context to optimize cluster-wide operations on the basis of theoretical complexity models and characteristics of the involved interconnects. Modern HPC runtime systems with a programmability focus can perform dependency analysis to eliminate the need for manual communication entirely. Profiting from optimized collective routines in this context often requires global analysis of the implicit point-to-point communication pattern or tight constrains on the data access patterns allowed inside kernels. The Celerity API provides a high degree of freedom for both runtime implementors and application developers by tieing transparent work assignment to data access patterns through user-defined range-mapper functions. Canonically, data dependencies are resolved through an intra-node coherence model and inter-node point-to-point communication. This paper presents Collective Pattern Discovery (CPD), a fully distributed, coordination-free method for detecting collective communication patterns on parallelized task graphs. Through extensive scheduling and communication microbenchmarks as well as a strong scaling experiment on a compute-intensive application, we demonstrate that CPD can achieve substantial performance gains in the Celerity model.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Parallel Programming
International Journal of Parallel Programming 工程技术-计算机:理论方法
CiteScore
4.40
自引率
0.00%
发文量
15
审稿时长
>12 weeks
期刊介绍: International Journal of Parallel Programming is a forum for the publication of peer-reviewed, high-quality original papers in the computer and information sciences, focusing specifically on programming aspects of parallel computing systems. Such systems are characterized by the coexistence over time of multiple coordinated activities. The journal publishes both original research and survey papers. Fields of interest include: linguistic foundations, conceptual frameworks, high-level languages, evaluation methods, implementation techniques, programming support systems, pragmatic considerations, architectural characteristics, software engineering aspects, advances in parallel algorithms, performance studies, and application studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信