Main-Memory Requirements of Big Data Applications on Commodity Server Platform

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI:10.1109/CCGRID.2018.00097

Hosein Mohammadi Makrani, S. Rafatirad, A. Houmansadr, H. Homayoun

引用次数: 11

Abstract

The emergence of big data frameworks requires computational and memory resources that can naturally scale to manage massive amounts of diverse data. It is currently unclear whether big data frameworks such as Hadoop, Spark, and MPI will require high bandwidth and large capacity memory to cope with this change. The primary purpose of this study is to answer this question through empirical analysis of different memory configurations available for commodity server and to assess the impact of these configurations on the performance Hadoop and Spark frameworks, and MPI based applications. Our results show that neither DRAM capacity, frequency, nor the number of channels play a critical role on the performance of all studied Hadoop as well as most studied Spark applications. However, our results reveal that iterative tasks (e.g. machine learning) in Spark and MPI are benefiting from a high bandwidth and large capacity memory.

查看原文本刊更多论文

商品服务器平台上大数据应用的主存需求

大数据框架的出现需要计算和内存资源，这些资源可以自然扩展以管理大量不同的数据。目前还不清楚Hadoop、Spark和MPI等大数据框架是否需要高带宽和大容量内存来应对这种变化。本研究的主要目的是通过对商品服务器可用的不同内存配置的实证分析来回答这个问题，并评估这些配置对Hadoop和Spark框架以及基于MPI的应用程序性能的影响。我们的研究结果表明，DRAM容量、频率和通道数量对所有研究的Hadoop以及大多数研究的Spark应用程序的性能都没有关键作用。然而，我们的研究结果表明，Spark和MPI中的迭代任务(例如机器学习)受益于高带宽和大容量内存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量