Generating Efficient Tensor Contractions for GPUs

T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris
{"title":"Generating Efficient Tensor Contractions for GPUs","authors":"T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris","doi":"10.1109/ICPP.2015.106","DOIUrl":null,"url":null,"abstract":"Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on the tensor dimensionality and the target architecture. In this paper, we map tensor computations to GPUs, starting with a high-level tensor input language and producing efficient CUDA code as output. Our approach is to combine tensor-specific mathematical transformations with a GPU decision algorithm, machine learning and auto tuning of a large parameter space. Generated code shows significant performance gains over sequential and Open MP parallel code, and a comparison with Open ACC shows the importance of auto tuning and other optimizations in our framework for achieving efficient results.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"35 13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on the tensor dimensionality and the target architecture. In this paper, we map tensor computations to GPUs, starting with a high-level tensor input language and producing efficient CUDA code as output. Our approach is to combine tensor-specific mathematical transformations with a GPU decision algorithm, machine learning and auto tuning of a large parameter space. Generated code shows significant performance gains over sequential and Open MP parallel code, and a comparison with Open ACC shows the importance of auto tuning and other optimizations in our framework for achieving efficient results.
为gpu生成高效张量收缩
许多科学和数值应用,包括量子化学建模和流体动力学模拟,都需要张量积和张量收缩的评估。张量计算的特点是具有多个维度的数组、固有的并行性、适度的数据重用和执行计算顺序的多个自由度。性能最好的实现在很大程度上依赖于张量维度和目标体系结构。在本文中,我们将张量计算映射到gpu,从高级张量输入语言开始,并产生高效的CUDA代码作为输出。我们的方法是将张量特定的数学变换与GPU决策算法、机器学习和大参数空间的自动调整相结合。与顺序和Open MP并行代码相比,生成的代码显示了显著的性能提升,并且与Open ACC的比较显示了在我们的框架中自动调优和其他优化对于实现高效结果的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信