支持cuda的Hadoop集群，用于快速分布式图像处理

2013 National Conference on Parallel Computing Technologies (PARCOMPTECH) Pub Date : 2013-10-08 DOI:10.1109/PARCOMPTECH.2013.6621392

Ranajoy Malakar, N. Vydyanathan

{"title":"支持cuda的Hadoop集群，用于快速分布式图像处理","authors":"Ranajoy Malakar, N. Vydyanathan","doi":"10.1109/PARCOMPTECH.2013.6621392","DOIUrl":null,"url":null,"abstract":"Hadoop is a map-reduce based distributed processing framework, frequently used in the industry today, in areas of big data analysis, particularly text analysis. Graphics processing units (GPUs), on the other hand, are massively parallel platforms with attractive performance to price and power ratios, used extensively in the recent years for acceleration of data parallel computations. CUDA or Compute Unified Device Architecture is a C-based programming model proposed by NVIDIA for leveraging the parallel computing capabilities of the GPU for general purpose computations. This paper attempts to integrate CUDA acceleration into the Hadoop distributed processing framework to create a heterogeneous high performance image processing system. As Hadoop primarily is used for text analysis, this involves facilitating efficient image processing in Hadoop. Our experimental evaluations using a Adaboost based face detection algorithm indicate that CUDA-enabling a Hadoop cluster, even with low-end GPUs, can result in a 25% improvement in data processing throughput, indicating that an integration of these two technologies can help build scalable, high throughput, power and cost-efficient computing platforms.","PeriodicalId":344858,"journal":{"name":"2013 National Conference on Parallel Computing Technologies (PARCOMPTECH)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"A CUDA-enabled Hadoop cluster for fast distributed image processing\",\"authors\":\"Ranajoy Malakar, N. Vydyanathan\",\"doi\":\"10.1109/PARCOMPTECH.2013.6621392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is a map-reduce based distributed processing framework, frequently used in the industry today, in areas of big data analysis, particularly text analysis. Graphics processing units (GPUs), on the other hand, are massively parallel platforms with attractive performance to price and power ratios, used extensively in the recent years for acceleration of data parallel computations. CUDA or Compute Unified Device Architecture is a C-based programming model proposed by NVIDIA for leveraging the parallel computing capabilities of the GPU for general purpose computations. This paper attempts to integrate CUDA acceleration into the Hadoop distributed processing framework to create a heterogeneous high performance image processing system. As Hadoop primarily is used for text analysis, this involves facilitating efficient image processing in Hadoop. Our experimental evaluations using a Adaboost based face detection algorithm indicate that CUDA-enabling a Hadoop cluster, even with low-end GPUs, can result in a 25% improvement in data processing throughput, indicating that an integration of these two technologies can help build scalable, high throughput, power and cost-efficient computing platforms.\",\"PeriodicalId\":344858,\"journal\":{\"name\":\"2013 National Conference on Parallel Computing Technologies (PARCOMPTECH)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 National Conference on Parallel Computing Technologies (PARCOMPTECH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PARCOMPTECH.2013.6621392\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 National Conference on Parallel Computing Technologies (PARCOMPTECH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PARCOMPTECH.2013.6621392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

Hadoop是一个基于map-reduce的分布式处理框架，在当今的行业中，在大数据分析领域，尤其是文本分析中经常使用。另一方面，图形处理单元(gpu)是具有具有吸引力的性能价格比和功率比的大规模并行平台，近年来被广泛用于加速数据并行计算。CUDA或计算统一设备架构是NVIDIA提出的基于c语言的编程模型，用于利用GPU的并行计算能力进行通用计算。本文试图将CUDA加速集成到Hadoop分布式处理框架中，创建一个异构的高性能图像处理系统。由于Hadoop主要用于文本分析，这涉及到在Hadoop中促进高效的图像处理。我们使用基于Adaboost的人脸检测算法进行的实验评估表明，启用cuda的Hadoop集群，即使使用低端gpu，也可以使数据处理吞吐量提高25%，这表明这两种技术的集成可以帮助构建可扩展，高吞吐量，功率和成本效益的计算平台。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A CUDA-enabled Hadoop cluster for fast distributed image processing

Hadoop is a map-reduce based distributed processing framework, frequently used in the industry today, in areas of big data analysis, particularly text analysis. Graphics processing units (GPUs), on the other hand, are massively parallel platforms with attractive performance to price and power ratios, used extensively in the recent years for acceleration of data parallel computations. CUDA or Compute Unified Device Architecture is a C-based programming model proposed by NVIDIA for leveraging the parallel computing capabilities of the GPU for general purpose computations. This paper attempts to integrate CUDA acceleration into the Hadoop distributed processing framework to create a heterogeneous high performance image processing system. As Hadoop primarily is used for text analysis, this involves facilitating efficient image processing in Hadoop. Our experimental evaluations using a Adaboost based face detection algorithm indicate that CUDA-enabling a Hadoop cluster, even with low-end GPUs, can result in a 25% improvement in data processing throughput, indicating that an integration of these two technologies can help build scalable, high throughput, power and cost-efficient computing platforms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 National Conference on Parallel Computing Technologies (PARCOMPTECH)

自引率

0.00%

发文量