{"title":"模拟大数据集群用于系统规划、评估和优化","authors":"Zhaojuan Bian, Kebing Wang, Zhihong Wang, Gene Munce, Illia Cremer, Wei Zhou, Qian Chen, Gen Xu","doi":"10.1109/ICPP.2014.48","DOIUrl":null,"url":null,"abstract":"With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Simulating Big Data Clusters for System Planning, Evaluation, and Optimization\",\"authors\":\"Zhaojuan Bian, Kebing Wang, Zhihong Wang, Gene Munce, Illia Cremer, Wei Zhou, Qian Chen, Gen Xu\",\"doi\":\"10.1109/ICPP.2014.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.\",\"PeriodicalId\":441115,\"journal\":{\"name\":\"2014 43rd International Conference on Parallel Processing\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 43rd International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2014.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 43rd International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2014.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Simulating Big Data Clusters for System Planning, Evaluation, and Optimization
With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.