Hardware Accelerated Mappers for Hadoop MapReduce Streaming

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-07-12 DOI:10.1109/TMSCS.2018.2854787

Katayoun Neshatpour;Maria Malik;Avesta Sasan;Setareh Rafatirad;Houman Homayoun

{"title":"Hardware Accelerated Mappers for Hadoop MapReduce Streaming","authors":"Katayoun Neshatpour;Maria Malik;Avesta Sasan;Setareh Rafatirad;Houman Homayoun","doi":"10.1109/TMSCS.2018.2854787","DOIUrl":null,"url":null,"abstract":"Heterogeneous architectures have emerged as an effective solution to address the energy-efficiency challenges. This is particularly happening in data centers where the integration of FPGA hardware accelerators with general purpose processors such as big Xeon or little Atom cores introduces enormous opportunities to address the power, scalability, and energy-efficiency challenges of processing emerging applications, in particular in domain of big data. Therefore, the rise of hardware accelerators in data centers, raises several important research questions: What is the potential for hardware acceleration in MapReduce, a defacto standard for big data analytics? What is the role of processor after acceleration; whether big or little core is most suited to run big data applications post hardware acceleration? This paper answers these questions through methodical real-system experiments on state-of-the-art hardware acceleration platforms. We first present the implementation of four highly used big data applications in a heterogeneous CPU+FPGA architecture. We develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine, and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA based hardware accelerating environment. We present a full implementation of the HW+SW mappers on existing FPGA+core platform and evaluate how a cluster of CPUs equipped with FPGAs uses the accelerated mapper to enhance the overall performance of MapReduce. Moreover, we study how various parameters at the application, system, and architecture levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration. This analysis helps to better understand how presence of HW accelerators for Hadoop MapReduce, changes the choice of CPU, tuning optimization parameters, and scheduling decisions for performance and energy-efficiency improvement. The results show a promising speedup as well as energy-efficiency gains of upto 5.7× and 16× is achieved, respectively, in an end-to-end Hadoop implementation using a semi-automated HLS framework. Results suggest that HW+SW acceleration yields significantly higher speedup on little cores, reducing the performance gap between little and big cores after the acceleration. On the other hand, the energy-efficiency benefit of HW+SW acceleration is higher on the big cores, which reduces the energy-efficiency gap between little and big cores. Overall, the experimental results show that a low cost embedded FPGA platform, programmed using a semi-automated HW+SW co-design methodology, brings significant performance and energy-efficiency gains for Hadoop MapReduce computing in cloud-based architectures and significantly reduces the reliance on large number of big high-performance cores.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"734-748"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2854787","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/8410464/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Heterogeneous architectures have emerged as an effective solution to address the energy-efficiency challenges. This is particularly happening in data centers where the integration of FPGA hardware accelerators with general purpose processors such as big Xeon or little Atom cores introduces enormous opportunities to address the power, scalability, and energy-efficiency challenges of processing emerging applications, in particular in domain of big data. Therefore, the rise of hardware accelerators in data centers, raises several important research questions: What is the potential for hardware acceleration in MapReduce, a defacto standard for big data analytics? What is the role of processor after acceleration; whether big or little core is most suited to run big data applications post hardware acceleration? This paper answers these questions through methodical real-system experiments on state-of-the-art hardware acceleration platforms. We first present the implementation of four highly used big data applications in a heterogeneous CPU+FPGA architecture. We develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine, and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA based hardware accelerating environment. We present a full implementation of the HW+SW mappers on existing FPGA+core platform and evaluate how a cluster of CPUs equipped with FPGAs uses the accelerated mapper to enhance the overall performance of MapReduce. Moreover, we study how various parameters at the application, system, and architecture levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration. This analysis helps to better understand how presence of HW accelerators for Hadoop MapReduce, changes the choice of CPU, tuning optimization parameters, and scheduling decisions for performance and energy-efficiency improvement. The results show a promising speedup as well as energy-efficiency gains of upto 5.7× and 16× is achieved, respectively, in an end-to-end Hadoop implementation using a semi-automated HLS framework. Results suggest that HW+SW acceleration yields significantly higher speedup on little cores, reducing the performance gap between little and big cores after the acceleration. On the other hand, the energy-efficiency benefit of HW+SW acceleration is higher on the big cores, which reduces the energy-efficiency gap between little and big cores. Overall, the experimental results show that a low cost embedded FPGA platform, programmed using a semi-automated HW+SW co-design methodology, brings significant performance and energy-efficiency gains for Hadoop MapReduce computing in cloud-based architectures and significantly reduces the reliance on large number of big high-performance cores.

查看原文本刊更多论文

Hadoop MapReduce流的硬件加速映射器

异构体系结构已成为解决能源效率挑战的有效解决方案。这种情况尤其发生在数据中心，FPGA硬件加速器与通用处理器（如大Xeon或小Atom核）的集成为解决处理新兴应用程序（尤其是大数据领域）的功率、可扩展性和能效挑战带来了巨大机遇。因此，数据中心硬件加速器的兴起提出了几个重要的研究问题：MapReduce作为大数据分析的实际标准，硬件加速的潜力是什么？加速后处理器的作用是什么；大核心还是小核心最适合在硬件加速后运行大数据应用程序？本文通过在最先进的硬件加速平台上进行系统的实际系统实验来回答这些问题。我们首先介绍了四个高度使用的大数据应用程序在异构CPU+FPGA架构中的实现。我们在Hadoop流式处理环境中开发了K-means、K近邻、支持向量机和朴素贝叶斯的MapReduce实现，该环境允许用适合与基于FPGA的硬件加速环境接口的非Java语言开发映射器函数。我们在现有的FPGA+核心平台上展示了HW+SW映射器的完整实现，并评估了配备FPGA的CPU集群如何使用加速映射器来增强MapReduce的整体性能。此外，我们还研究了应用程序、系统和体系结构级别的各种参数如何影响Hadoop流硬件加速的性能和能效优势。此分析有助于更好地了解Hadoop MapReduce硬件加速器的存在如何改变CPU的选择、调整优化参数和调度决策，以提高性能和能效。结果表明，在使用半自动化HLS框架的端到端Hadoop实现中，分别实现了5.7倍和16倍的有希望的加速和能效增益。结果表明，HW+SW加速在小内核上产生了显著更高的加速，减小了加速后小内核和大内核之间的性能差距。另一方面，HW+SW加速在大核心上的能效效益更高，这缩小了小核心和大核心之间的能效差距。总体而言，实验结果表明，使用半自动化HW+SW协同设计方法编程的低成本嵌入式FPGA平台，为基于云的架构中的Hadoop MapReduce计算带来了显著的性能和能效提升，并显著减少了对大量大型高性能核心的依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multi-Scale Computing Systems

自引率

0.00%

发文量