Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters

Wenjing Ma, S. Krishnamoorthy, Oreste Villa, K. Kowalski
{"title":"Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters","authors":"Wenjing Ma, S. Krishnamoorthy, Oreste Villa, K. Kowalski","doi":"10.1109/CLUSTER.2010.26","DOIUrl":null,"url":null,"abstract":"Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on GPUs requires tackling several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupled cluster method, and incorporated into NW Chem, a popular computational chemistry suite. We demonstrate speedup over a factor of 8.4 using one core per node and over 2.6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores. Finally, we analyze the implementation behavior on future GPU systems.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2010.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on GPUs requires tackling several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupled cluster method, and incorporated into NW Chem, a popular computational chemistry suite. We demonstrate speedup over a factor of 8.4 using one core per node and over 2.6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores. Finally, we analyze the implementation behavior on future GPU systems.
基于gpgpu的簇上流张量收缩表达式的加速
张量收缩是广义多维矩阵乘法运算,在量子化学中广泛应用。在gpu上有效执行张量收缩需要解决几个挑战,包括索引排列和减少线程块利用率的小维度尺寸。在本文中,我们提出了在GPU上自动生成CUDA代码以执行张量收缩的方法,包括管理CPU和GPU之间的数据移动。支持gpu的代码是为CCSD(T)中最昂贵的收缩生成的,CCSD(T)是一种键耦合簇方法,并将其合并到NW Chem(一种流行的计算化学套件)中。我们演示了在每个节点使用一个内核时加速系数超过8.4,在使用整个系统使用2个GPU和5个内核的混合CPU+GPU解决方案时加速系数超过2.6。最后,我们分析了在未来GPU系统上的实现行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信