SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark

Proceedings of the 12th ACM International Conference on Computing Frontiers Pub Date : 2015-05-06 DOI:10.1145/2742854.2747283

Min Li, Jian Tan, Yandong Wang, Li Zhang, V. Salapura

引用次数: 177

Abstract

Spark has been increasingly adopted by industries in recent years for big data analysis by providing a fault tolerant, scalable and easy-to-use in memory abstraction. Moreover, the community has been actively developing a rich ecosystem around Spark, making it even more attractive. However, there is not yet a Spark specify benchmark existing in the literature to guide the development and cluster deployment of Spark to better fit resource demands of user applications. In this paper, we present SparkBench, a Spark specific benchmarking suite, which includes a comprehensive set of applications. SparkBench covers four main categories of applications, including machine learning, graph computation, SQL query and streaming applications. We also characterize the resource consumption, data flow and timing information of each application and evaluate the performance impact of a key configuration parameter to guide the design and optimization of Spark data analytic platform.

查看原文本刊更多论文

SparkBench:内存数据分析平台Spark的全面基准测试套件

近年来，Spark通过提供容错、可扩展和易于使用的内存抽象，越来越多地被各行业采用，用于大数据分析。此外，社区一直在积极地围绕Spark开发丰富的生态系统，使其更具吸引力。但是，目前文献中还没有一个Spark指定基准来指导Spark的开发和集群部署，以更好地满足用户应用程序的资源需求。在本文中，我们介绍了SparkBench，一个特定于Spark的基准测试套件，其中包括一组全面的应用程序。SparkBench涵盖了四大类应用，包括机器学习、图计算、SQL查询和流应用。我们还描述了每个应用程序的资源消耗、数据流和时序信息，并评估了一个关键配置参数对性能的影响，以指导Spark数据分析平台的设计和优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 12th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量