abbench:大数据架构堆栈基准

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering Pub Date : 2018-04-02 DOI:10.1145/3185768.3186300

Todor Ivanov, Rekha Singhal

{"title":"abbench:大数据架构堆栈基准","authors":"Todor Ivanov, Rekha Singhal","doi":"10.1145/3185768.3186300","DOIUrl":null,"url":null,"abstract":"Distributed big data processing and analytics applications demand a comprehensive end-to-end architecture stack consisting of big data technologies. However, there are many possible architecture patterns (e.g. Lambda, Kappa or Pipeline architectures) to choose from when implementing the application requirements. A big data technology in isolation may be best performing for a particular application, but its performance in connection with other technologies depends on the connectors and the environment. Similarly, existing big data benchmarks evaluate the performance of different technologies in isolation, but no work has been done on benchmarking big data architecture stacks as a whole. For example, BigBench (TPCx-BB) may be used to evaluate the performance of Spark, but is it applicable to PySpark or to Spark with Kafka stack as well? What is the impact of having different programming environments and/or any other technology like Spark? This vision paper proposes a new category of benchmark, called ABench, to fill this gap and discusses key aspects necessary for the performance evaluation of different big data architecture stacks.","PeriodicalId":10596,"journal":{"name":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"ABench: Big Data Architecture Stack Benchmark\",\"authors\":\"Todor Ivanov, Rekha Singhal\",\"doi\":\"10.1145/3185768.3186300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed big data processing and analytics applications demand a comprehensive end-to-end architecture stack consisting of big data technologies. However, there are many possible architecture patterns (e.g. Lambda, Kappa or Pipeline architectures) to choose from when implementing the application requirements. A big data technology in isolation may be best performing for a particular application, but its performance in connection with other technologies depends on the connectors and the environment. Similarly, existing big data benchmarks evaluate the performance of different technologies in isolation, but no work has been done on benchmarking big data architecture stacks as a whole. For example, BigBench (TPCx-BB) may be used to evaluate the performance of Spark, but is it applicable to PySpark or to Spark with Kafka stack as well? What is the impact of having different programming environments and/or any other technology like Spark? This vision paper proposes a new category of benchmark, called ABench, to fill this gap and discusses key aspects necessary for the performance evaluation of different big data architecture stacks.\",\"PeriodicalId\":10596,\"journal\":{\"name\":\"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3185768.3186300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2018 ACM/SPEC International Conference on Performance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3185768.3186300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

分布式大数据处理和分析应用需要一个由大数据技术组成的全面的端到端架构堆栈。然而，在实现应用程序需求时，有许多可能的体系结构模式(例如Lambda、Kappa或Pipeline体系结构)可供选择。单独使用的大数据技术可能对特定应用具有最佳性能，但与其他技术结合使用时，其性能取决于连接器和环境。同样，现有的大数据基准是孤立地评估不同技术的性能，但没有对大数据架构堆栈进行整体基准测试。例如，BigBench (TPCx-BB)可以用来评估Spark的性能，但它是否适用于PySpark或带有Kafka堆栈的Spark ?拥有不同的编程环境和/或其他像Spark这样的技术有什么影响?本文提出了一个新的基准类别，称为abbench，以填补这一空白，并讨论了不同大数据架构堆栈性能评估所需的关键方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ABench: Big Data Architecture Stack Benchmark

Distributed big data processing and analytics applications demand a comprehensive end-to-end architecture stack consisting of big data technologies. However, there are many possible architecture patterns (e.g. Lambda, Kappa or Pipeline architectures) to choose from when implementing the application requirements. A big data technology in isolation may be best performing for a particular application, but its performance in connection with other technologies depends on the connectors and the environment. Similarly, existing big data benchmarks evaluate the performance of different technologies in isolation, but no work has been done on benchmarking big data architecture stacks as a whole. For example, BigBench (TPCx-BB) may be used to evaluate the performance of Spark, but is it applicable to PySpark or to Spark with Kafka stack as well? What is the impact of having different programming environments and/or any other technology like Spark? This vision paper proposes a new category of benchmark, called ABench, to fill this gap and discusses key aspects necessary for the performance evaluation of different big data architecture stacks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion of the 2018 ACM/SPEC International Conference on Performance Engineering

自引率

0.00%

发文量