Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems

K. Murthy, J. Mellor-Crummey
{"title":"Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems","authors":"K. Murthy, J. Mellor-Crummey","doi":"10.1109/PACT.2015.41","DOIUrl":null,"url":null,"abstract":"Data movement is a critical bottleneck for future generations of parallel systems. The class of .5D communication-avoiding algorithms were developed to address this bottleneck. These algorithms reduce communication and provide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoiding-libraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through-high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. With the latter, wrap around communication patterns can also be achieved using subscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communication and computation present in a .5D algorithm. After partitioning data and computation, it inserts point-to-point-and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 super computer achieves 59 TFlops/s (76% of the machine peak). Our generated parallel code achieves 91% of the performance of a hand-coded version.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Data movement is a critical bottleneck for future generations of parallel systems. The class of .5D communication-avoiding algorithms were developed to address this bottleneck. These algorithms reduce communication and provide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoiding-libraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through-high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. With the latter, wrap around communication patterns can also be achieved using subscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communication and computation present in a .5D algorithm. After partitioning data and computation, it inserts point-to-point-and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 super computer achieves 59 TFlops/s (76% of the machine peak). Our generated parallel code achieves 91% of the performance of a hand-coded version.
避免通信的算法:并行系统的分析和代码生成
数据移动是未来几代并行系统的关键瓶颈。为了解决这一瓶颈,开发了一类0.5 d通信避免算法。这些算法减少了通信,并在时间和精力上提供了强大的可伸缩性。作为自动化开发避免通信库的第一步,我们开发了Maunam编译器。Maunam通过使用符号数据大小和处理器数量表示的。5 d算法的高级全局视图草图生成高效的并行代码。它支持通过高级全局操作(如TILT和CSHIFT)以及通过元素复制操作来表达数据移动和通信。对于后者,还可以使用基于模操作的下标来实现封装通信模式。Maunam使用多面体分析来解释0.5 d算法中存在的通信和计算。在划分数据和计算后,根据需要插入点对点和集体通信。Maunam还分析了数据依赖模式和数据布局,以确定处理器子集的减少。maunam生成的用于2.5D矩阵乘法的Fortran+MPI代码在Cray XC30超级计算机的4096个核上运行,达到59 TFlops/s(机器峰值的76%)。我们生成的并行代码达到了手工编码版本的91%的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信