支持cuda的Hadoop集群用于稀疏矩阵向量乘法

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) Pub Date : 2015-07-09 DOI:10.1109/ReTIS.2015.7232872

M. Reza, Aman Sinha, Rajkumar Nag, P. Mohanty

{"title":"支持cuda的Hadoop集群用于稀疏矩阵向量乘法","authors":"M. Reza, Aman Sinha, Rajkumar Nag, P. Mohanty","doi":"10.1109/ReTIS.2015.7232872","DOIUrl":null,"url":null,"abstract":"Compute Unified Device Architecture (CUDA) is an architecture and programming model that allows leveraging the high compute-intensive processing power of the Graphical Processing Units (GPUs) to perform general, non-graphical tasks in a massively parallel manner. Hadoop is an open-source software framework that has its own file system, the Hadoop Distributed File System (HDFS), and its own programming model, the Map Reduce, in order to accomplish the tasks of storage of very large amount of data and their fast processing in a distributed manner in a cluster of inexpensive hardware. This paper presents a model and implementation of a Hadoop-CUDA Hybrid approach to perform Sparse Matrix Vector Multiplication (SpMV) of very large matrices in a very high performing manner. Hadoop is used for splitting the input matrix into smaller sub-matrices, storing them on individual data nodes and then invoking the required CUDA kernels on the individual GPU-possessing cluster nodes. The original SpMV is done using CUDA. Such an implementation has been seen to improve the performance of the SpMV operation over very large matrices by speedup of around 1.4 in comparison to non-Hadoop, single-GPU CUDA implementation.","PeriodicalId":161306,"journal":{"name":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication\",\"authors\":\"M. Reza, Aman Sinha, Rajkumar Nag, P. Mohanty\",\"doi\":\"10.1109/ReTIS.2015.7232872\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compute Unified Device Architecture (CUDA) is an architecture and programming model that allows leveraging the high compute-intensive processing power of the Graphical Processing Units (GPUs) to perform general, non-graphical tasks in a massively parallel manner. Hadoop is an open-source software framework that has its own file system, the Hadoop Distributed File System (HDFS), and its own programming model, the Map Reduce, in order to accomplish the tasks of storage of very large amount of data and their fast processing in a distributed manner in a cluster of inexpensive hardware. This paper presents a model and implementation of a Hadoop-CUDA Hybrid approach to perform Sparse Matrix Vector Multiplication (SpMV) of very large matrices in a very high performing manner. Hadoop is used for splitting the input matrix into smaller sub-matrices, storing them on individual data nodes and then invoking the required CUDA kernels on the individual GPU-possessing cluster nodes. The original SpMV is done using CUDA. Such an implementation has been seen to improve the performance of the SpMV operation over very large matrices by speedup of around 1.4 in comparison to non-Hadoop, single-GPU CUDA implementation.\",\"PeriodicalId\":161306,\"journal\":{\"name\":\"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReTIS.2015.7232872\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReTIS.2015.7232872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

计算统一设备架构(CUDA)是一种架构和编程模型，它允许利用图形处理单元(gpu)的高计算密集型处理能力，以大规模并行的方式执行一般的非图形任务。Hadoop是一个开源软件框架，它有自己的文件系统Hadoop分布式文件系统(HDFS)和自己的编程模型Map Reduce，以便在廉价硬件集群中以分布式方式完成大量数据的存储和快速处理任务。本文提出了一种Hadoop-CUDA混合方法的模型和实现，以非常高性能的方式对非常大的矩阵执行稀疏矩阵向量乘法(SpMV)。Hadoop用于将输入矩阵分割成更小的子矩阵，将它们存储在单个数据节点上，然后在单个拥有gpu的集群节点上调用所需的CUDA内核。最初的SpMV是使用CUDA完成的。与非hadoop、单gpu CUDA实现相比，这样的实现可以在非常大的矩阵上提高SpMV操作的性能，加速速度约为1.4。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication

Compute Unified Device Architecture (CUDA) is an architecture and programming model that allows leveraging the high compute-intensive processing power of the Graphical Processing Units (GPUs) to perform general, non-graphical tasks in a massively parallel manner. Hadoop is an open-source software framework that has its own file system, the Hadoop Distributed File System (HDFS), and its own programming model, the Map Reduce, in order to accomplish the tasks of storage of very large amount of data and their fast processing in a distributed manner in a cluster of inexpensive hardware. This paper presents a model and implementation of a Hadoop-CUDA Hybrid approach to perform Sparse Matrix Vector Multiplication (SpMV) of very large matrices in a very high performing manner. Hadoop is used for splitting the input matrix into smaller sub-matrices, storing them on individual data nodes and then invoking the required CUDA kernels on the individual GPU-possessing cluster nodes. The original SpMV is done using CUDA. Such an implementation has been seen to improve the performance of the SpMV operation over very large matrices by speedup of around 1.4 in comparison to non-Hadoop, single-GPU CUDA implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

自引率

0.00%

发文量