Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI:10.1109/CLUSTER.2012.42

Vlad Slavici, R. Varier, G. Cooperman, R. Harrison

引用次数: 3

Abstract

Graphics Processing Units (GPUs) are becoming the workhorse of scalable computations. MADNESS is a scientific framework used especially for computational chemistry. Most MADNESS applications use operators that involve many small tensor computations, resulting in a less regular organization of computations on GPUs. A single GPU kernel may have to multiply by hundreds of small square matrices (with fixed dimension ranging from 10 to 28). We demonstrate a scalable CPU-GPU implementation of the MADNESS framework over a 500-node partition on the Titan supercomputer. For this hybrid CPU-GPU implementation, we observe up to a 2.3-times speedup compared to an equivalent CPU-only implementation with 16 cores per node. For smaller matrices, we demonstrate a speedup of 2.2-times by using a custom CUDA kernel rather than a cuBLAS-based kernel.

查看原文本刊更多论文

在MADNESS框架下适应大型CPU-GPU集群的不规则计算

图形处理单元(gpu)正在成为可伸缩计算的主力。MADNESS是一个专门用于计算化学的科学框架。大多数MADNESS应用程序使用涉及许多小张量计算的运算符，导致gpu上的计算组织不太规则。单个GPU内核可能必须乘以数百个小方阵(固定维度范围从10到28)。我们在Titan超级计算机的500个节点分区上演示了MADNESS框架的可扩展CPU-GPU实现。对于这种混合CPU-GPU实现，我们观察到与每个节点16核的等效cpu实现相比，速度提高了2.3倍。对于较小的矩阵，我们通过使用定制CUDA内核而不是基于cublas的内核演示了2.2倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量