{"title":"大数据分析时代的统一尺度模型","authors":"Zhongwei Li, Feng Duan, Hao Che","doi":"10.1145/3318265.3318268","DOIUrl":null,"url":null,"abstract":"As scale-out execution of big data analytics has become predominate datacenter workloads, it is of paramount importance to faithfully characterize the scaling properties for such workloads. To date, the most widely cited scaling laws for big data analytics is the traditional Amdahl's law, which was discovered well before the era of big data analytics. A key observation made in this paper is that both the system and workload models underlying the traditional scaling laws are too simplistic to fully characterize the scaling properties for big data analytics workloads. In this paper, we put forward a Unified Scaling model for Big data Analytics (USBA), based on a multi-stage system model and a discretized workload model. USBA allows for flexible workload scaling unifying the fixed-size and fixed-time workload models underlying Amdahl's and Gustafson's laws, respectively, and flexible system scaling in terms of both number of stages and degree of parallelism per stage. Moreover, to faithfully characterize the scaling properties for big data analytics workloads, USBA accounts for variabilities of task response times and barrier synchronization. Finally, application of USBA to the scaling analysis of four Spark-based data mining and graph benchmarks demonstrates that USBA is able to adequately characterize the scaling design space and predict the scaling properties of real-world big data analytics workloads. This makes it possible to use USBA as a useful tool to facilitate job resource provisioning for big data analytics in datacenters.","PeriodicalId":241692,"journal":{"name":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A unified scaling model in the era of big data analytics\",\"authors\":\"Zhongwei Li, Feng Duan, Hao Che\",\"doi\":\"10.1145/3318265.3318268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As scale-out execution of big data analytics has become predominate datacenter workloads, it is of paramount importance to faithfully characterize the scaling properties for such workloads. To date, the most widely cited scaling laws for big data analytics is the traditional Amdahl's law, which was discovered well before the era of big data analytics. A key observation made in this paper is that both the system and workload models underlying the traditional scaling laws are too simplistic to fully characterize the scaling properties for big data analytics workloads. In this paper, we put forward a Unified Scaling model for Big data Analytics (USBA), based on a multi-stage system model and a discretized workload model. USBA allows for flexible workload scaling unifying the fixed-size and fixed-time workload models underlying Amdahl's and Gustafson's laws, respectively, and flexible system scaling in terms of both number of stages and degree of parallelism per stage. Moreover, to faithfully characterize the scaling properties for big data analytics workloads, USBA accounts for variabilities of task response times and barrier synchronization. Finally, application of USBA to the scaling analysis of four Spark-based data mining and graph benchmarks demonstrates that USBA is able to adequately characterize the scaling design space and predict the scaling properties of real-world big data analytics workloads. This makes it possible to use USBA as a useful tool to facilitate job resource provisioning for big data analytics in datacenters.\",\"PeriodicalId\":241692,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318265.3318268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318265.3318268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A unified scaling model in the era of big data analytics
As scale-out execution of big data analytics has become predominate datacenter workloads, it is of paramount importance to faithfully characterize the scaling properties for such workloads. To date, the most widely cited scaling laws for big data analytics is the traditional Amdahl's law, which was discovered well before the era of big data analytics. A key observation made in this paper is that both the system and workload models underlying the traditional scaling laws are too simplistic to fully characterize the scaling properties for big data analytics workloads. In this paper, we put forward a Unified Scaling model for Big data Analytics (USBA), based on a multi-stage system model and a discretized workload model. USBA allows for flexible workload scaling unifying the fixed-size and fixed-time workload models underlying Amdahl's and Gustafson's laws, respectively, and flexible system scaling in terms of both number of stages and degree of parallelism per stage. Moreover, to faithfully characterize the scaling properties for big data analytics workloads, USBA accounts for variabilities of task response times and barrier synchronization. Finally, application of USBA to the scaling analysis of four Spark-based data mining and graph benchmarks demonstrates that USBA is able to adequately characterize the scaling design space and predict the scaling properties of real-world big data analytics workloads. This makes it possible to use USBA as a useful tool to facilitate job resource provisioning for big data analytics in datacenters.