SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark

Min Li, Jian Tan, Yandong Wang, Li Zhang, V. Salapura
{"title":"SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark","authors":"Min Li, Jian Tan, Yandong Wang, Li Zhang, V. Salapura","doi":"10.1145/2742854.2747283","DOIUrl":null,"url":null,"abstract":"Spark has been increasingly adopted by industries in recent years for big data analysis by providing a fault tolerant, scalable and easy-to-use in memory abstraction. Moreover, the community has been actively developing a rich ecosystem around Spark, making it even more attractive. However, there is not yet a Spark specify benchmark existing in the literature to guide the development and cluster deployment of Spark to better fit resource demands of user applications. In this paper, we present SparkBench, a Spark specific benchmarking suite, which includes a comprehensive set of applications. SparkBench covers four main categories of applications, including machine learning, graph computation, SQL query and streaming applications. We also characterize the resource consumption, data flow and timing information of each application and evaluate the performance impact of a key configuration parameter to guide the design and optimization of Spark data analytic platform.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"177","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2742854.2747283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 177

Abstract

Spark has been increasingly adopted by industries in recent years for big data analysis by providing a fault tolerant, scalable and easy-to-use in memory abstraction. Moreover, the community has been actively developing a rich ecosystem around Spark, making it even more attractive. However, there is not yet a Spark specify benchmark existing in the literature to guide the development and cluster deployment of Spark to better fit resource demands of user applications. In this paper, we present SparkBench, a Spark specific benchmarking suite, which includes a comprehensive set of applications. SparkBench covers four main categories of applications, including machine learning, graph computation, SQL query and streaming applications. We also characterize the resource consumption, data flow and timing information of each application and evaluate the performance impact of a key configuration parameter to guide the design and optimization of Spark data analytic platform.
SparkBench:内存数据分析平台Spark的全面基准测试套件
近年来,Spark通过提供容错、可扩展和易于使用的内存抽象,越来越多地被各行业采用,用于大数据分析。此外,社区一直在积极地围绕Spark开发丰富的生态系统,使其更具吸引力。但是,目前文献中还没有一个Spark指定基准来指导Spark的开发和集群部署,以更好地满足用户应用程序的资源需求。在本文中,我们介绍了SparkBench,一个特定于Spark的基准测试套件,其中包括一组全面的应用程序。SparkBench涵盖了四大类应用,包括机器学习、图计算、SQL查询和流应用。我们还描述了每个应用程序的资源消耗、数据流和时序信息,并评估了一个关键配置参数对性能的影响,以指导Spark数据分析平台的设计和优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信