大数据分析时代的统一尺度模型

Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications Pub Date : 2019-03-08 DOI:10.1145/3318265.3318268

Zhongwei Li, Feng Duan, Hao Che

{"title":"大数据分析时代的统一尺度模型","authors":"Zhongwei Li, Feng Duan, Hao Che","doi":"10.1145/3318265.3318268","DOIUrl":null,"url":null,"abstract":"As scale-out execution of big data analytics has become predominate datacenter workloads, it is of paramount importance to faithfully characterize the scaling properties for such workloads. To date, the most widely cited scaling laws for big data analytics is the traditional Amdahl's law, which was discovered well before the era of big data analytics. A key observation made in this paper is that both the system and workload models underlying the traditional scaling laws are too simplistic to fully characterize the scaling properties for big data analytics workloads. In this paper, we put forward a Unified Scaling model for Big data Analytics (USBA), based on a multi-stage system model and a discretized workload model. USBA allows for flexible workload scaling unifying the fixed-size and fixed-time workload models underlying Amdahl's and Gustafson's laws, respectively, and flexible system scaling in terms of both number of stages and degree of parallelism per stage. Moreover, to faithfully characterize the scaling properties for big data analytics workloads, USBA accounts for variabilities of task response times and barrier synchronization. Finally, application of USBA to the scaling analysis of four Spark-based data mining and graph benchmarks demonstrates that USBA is able to adequately characterize the scaling design space and predict the scaling properties of real-world big data analytics workloads. This makes it possible to use USBA as a useful tool to facilitate job resource provisioning for big data analytics in datacenters.","PeriodicalId":241692,"journal":{"name":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A unified scaling model in the era of big data analytics\",\"authors\":\"Zhongwei Li, Feng Duan, Hao Che\",\"doi\":\"10.1145/3318265.3318268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As scale-out execution of big data analytics has become predominate datacenter workloads, it is of paramount importance to faithfully characterize the scaling properties for such workloads. To date, the most widely cited scaling laws for big data analytics is the traditional Amdahl's law, which was discovered well before the era of big data analytics. A key observation made in this paper is that both the system and workload models underlying the traditional scaling laws are too simplistic to fully characterize the scaling properties for big data analytics workloads. In this paper, we put forward a Unified Scaling model for Big data Analytics (USBA), based on a multi-stage system model and a discretized workload model. USBA allows for flexible workload scaling unifying the fixed-size and fixed-time workload models underlying Amdahl's and Gustafson's laws, respectively, and flexible system scaling in terms of both number of stages and degree of parallelism per stage. Moreover, to faithfully characterize the scaling properties for big data analytics workloads, USBA accounts for variabilities of task response times and barrier synchronization. Finally, application of USBA to the scaling analysis of four Spark-based data mining and graph benchmarks demonstrates that USBA is able to adequately characterize the scaling design space and predict the scaling properties of real-world big data analytics workloads. This makes it possible to use USBA as a useful tool to facilitate job resource provisioning for big data analytics in datacenters.\",\"PeriodicalId\":241692,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318265.3318268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318265.3318268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

随着大数据分析的横向扩展执行已成为数据中心工作负载的主导，忠实地描述此类工作负载的扩展属性至关重要。到目前为止，大数据分析中被引用最多的缩放定律是传统的Amdahl定律，它早在大数据分析时代之前就被发现了。本文所做的一个关键观察是，基于传统缩放定律的系统和工作负载模型都过于简单，无法完全表征大数据分析工作负载的缩放特性。本文提出了基于多阶段系统模型和离散化工作负载模型的大数据分析统一扩展模型(USBA)。USBA允许灵活的工作负载扩展，统一Amdahl定律和Gustafson定律基础上的固定大小和固定时间的工作负载模型，以及在阶段数量和每个阶段的并行度方面灵活的系统扩展。此外，为了忠实地描述大数据分析工作负载的扩展特性，USBA考虑了任务响应时间和屏障同步的可变性。最后，将USBA应用于四个基于spark的数据挖掘和图形基准的可伸缩性分析表明，USBA能够充分表征可伸缩性设计空间，并预测现实世界大数据分析工作负载的可伸缩性属性。这使得使用USBA作为一个有用的工具，为数据中心的大数据分析提供工作资源成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A unified scaling model in the era of big data analytics

As scale-out execution of big data analytics has become predominate datacenter workloads, it is of paramount importance to faithfully characterize the scaling properties for such workloads. To date, the most widely cited scaling laws for big data analytics is the traditional Amdahl's law, which was discovered well before the era of big data analytics. A key observation made in this paper is that both the system and workload models underlying the traditional scaling laws are too simplistic to fully characterize the scaling properties for big data analytics workloads. In this paper, we put forward a Unified Scaling model for Big data Analytics (USBA), based on a multi-stage system model and a discretized workload model. USBA allows for flexible workload scaling unifying the fixed-size and fixed-time workload models underlying Amdahl's and Gustafson's laws, respectively, and flexible system scaling in terms of both number of stages and degree of parallelism per stage. Moreover, to faithfully characterize the scaling properties for big data analytics workloads, USBA accounts for variabilities of task response times and barrier synchronization. Finally, application of USBA to the scaling analysis of four Spark-based data mining and graph benchmarks demonstrates that USBA is able to adequately characterize the scaling design space and predict the scaling properties of real-world big data analytics workloads. This makes it possible to use USBA as a useful tool to facilitate job resource provisioning for big data analytics in datacenters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

自引率

0.00%

发文量