使用Powerlists在gpu上缩放计算

2015 IEEE 22nd International Conference on High Performance Computing Workshops Pub Date : 2015-12-16 DOI:10.1109/HiPCW.2015.14

Anshu S. Anand, R. Shyamasundar

{"title":"使用Powerlists在gpu上缩放计算","authors":"Anshu S. Anand, R. Shyamasundar","doi":"10.1109/HiPCW.2015.14","DOIUrl":null,"url":null,"abstract":"With the explosion of big data analytics, scaling linear algebra packages has become extremely important. Inthe context of GPUs, cuBLAS API provides a highly efficientpackage for linear algebra subroutines on a single GPU. Dueto inputs of large dimensions, it often becomes necessary tocompute over clusters. However, the package does not provide facilities for computing over a 'cluster of GPUs' efficiently. Inthis paper, we demonstrate a high level framework for scaling linear algebra computations across a cluster of GPUs, through matrix multiplication problem. In particular, we describe amethod of specifying matrices using powerlists that captures both parallelism and recursion succinctly, and automatically schedule partitioned matrices over a GPU cluster to gain the advantages of cuBLAS for computing the product of partitioned matrices over a cluster of GPUs. Our experimental results show significant performance gains, of the order ofat least 132% for large matrices over that of a single GPUcomputation. The method reflects the map-reduce paradigmwhere the matrices are mapped to appropriate partitioned matrices and sent to appropriate members of the clusters andthe results are collected to obtain the resultant matrix.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Scaling Computation on GPUs Using Powerlists\",\"authors\":\"Anshu S. Anand, R. Shyamasundar\",\"doi\":\"10.1109/HiPCW.2015.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the explosion of big data analytics, scaling linear algebra packages has become extremely important. Inthe context of GPUs, cuBLAS API provides a highly efficientpackage for linear algebra subroutines on a single GPU. Dueto inputs of large dimensions, it often becomes necessary tocompute over clusters. However, the package does not provide facilities for computing over a 'cluster of GPUs' efficiently. Inthis paper, we demonstrate a high level framework for scaling linear algebra computations across a cluster of GPUs, through matrix multiplication problem. In particular, we describe amethod of specifying matrices using powerlists that captures both parallelism and recursion succinctly, and automatically schedule partitioned matrices over a GPU cluster to gain the advantages of cuBLAS for computing the product of partitioned matrices over a cluster of GPUs. Our experimental results show significant performance gains, of the order ofat least 132% for large matrices over that of a single GPUcomputation. The method reflects the map-reduce paradigmwhere the matrices are mapped to appropriate partitioned matrices and sent to appropriate members of the clusters andthe results are collected to obtain the resultant matrix.\",\"PeriodicalId\":203902,\"journal\":{\"name\":\"2015 IEEE 22nd International Conference on High Performance Computing Workshops\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 22nd International Conference on High Performance Computing Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPCW.2015.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPCW.2015.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

随着大数据分析的爆炸式增长，扩展线性代数包变得极其重要。在GPU环境下，cuBLAS API为单个GPU上的线性代数子程序提供了一个高效的包。由于输入的维度很大，通常需要对集群进行计算。然而，该软件包并没有为“gpu集群”提供高效的计算设施。在本文中，我们通过矩阵乘法问题演示了一个用于跨gpu集群缩放线性代数计算的高级框架。特别是，我们描述了使用powerlist指定矩阵的方法，该方法简洁地捕获了并行性和递归，并在GPU集群上自动调度分区矩阵，以获得cuBLAS在GPU集群上计算分区矩阵乘积的优势。我们的实验结果显示了显著的性能提升，与单个gpu计算相比，大型矩阵的性能至少提高了132%。该方法反映了映射-约简范式，将矩阵映射到适当的分区矩阵，并发送给簇的适当成员，并收集结果以获得最终矩阵。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scaling Computation on GPUs Using Powerlists

With the explosion of big data analytics, scaling linear algebra packages has become extremely important. Inthe context of GPUs, cuBLAS API provides a highly efficientpackage for linear algebra subroutines on a single GPU. Dueto inputs of large dimensions, it often becomes necessary tocompute over clusters. However, the package does not provide facilities for computing over a 'cluster of GPUs' efficiently. Inthis paper, we demonstrate a high level framework for scaling linear algebra computations across a cluster of GPUs, through matrix multiplication problem. In particular, we describe amethod of specifying matrices using powerlists that captures both parallelism and recursion succinctly, and automatically schedule partitioned matrices over a GPU cluster to gain the advantages of cuBLAS for computing the product of partitioned matrices over a cluster of GPUs. Our experimental results show significant performance gains, of the order ofat least 132% for large matrices over that of a single GPUcomputation. The method reflects the map-reduce paradigmwhere the matrices are mapped to appropriate partitioned matrices and sent to appropriate members of the clusters andthe results are collected to obtain the resultant matrix.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE 22nd International Conference on High Performance Computing Workshops

自引率

0.00%

发文量