模拟大数据集群用于系统规划、评估和优化

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI:10.1109/ICPP.2014.48

Zhaojuan Bian, Kebing Wang, Zhihong Wang, Gene Munce, Illia Cremer, Wei Zhou, Qian Chen, Gen Xu

{"title":"模拟大数据集群用于系统规划、评估和优化","authors":"Zhaojuan Bian, Kebing Wang, Zhihong Wang, Gene Munce, Illia Cremer, Wei Zhou, Qian Chen, Gen Xu","doi":"10.1109/ICPP.2014.48","DOIUrl":null,"url":null,"abstract":"With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Simulating Big Data Clusters for System Planning, Evaluation, and Optimization\",\"authors\":\"Zhaojuan Bian, Kebing Wang, Zhihong Wang, Gene Munce, Illia Cremer, Wei Zhou, Qian Chen, Gen Xu\",\"doi\":\"10.1109/ICPP.2014.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.\",\"PeriodicalId\":441115,\"journal\":{\"name\":\"2014 43rd International Conference on Parallel Processing\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 43rd International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2014.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 43rd International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2014.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

随着大数据技术的快速发展，计算机集群的IT支出也在快速增长。为了最小化成本，架构师必须在规划大数据集群时仔细评估各种设计选择。当前的容量规划方法大多是基于试错或高层次的估计。然而，这些方法远非有效，特别是随着硬件多样性和软件堆栈复杂性的增加。在本文中，我们提出了一种新的集群模拟方法CSMethod，以促进在系统配置之前有效的集群容量规划，性能评估和优化。使用我们提出的方法，软件栈通过抽象但高保真的模型来模拟，源自软件操作的硬件活动被动态映射到处理器、内存、存储和网络设备的架构模型上。这种硬件/软件混合方法允许低开销，快速和准确的集群模拟，可以很容易地在标准客户端平台(台式机或笔记本电脑)上执行。我们对六种流行的Hadoop工作负载的实验结果表明，CSMethod可以在各种软件参数和集群硬件配置下实现低于6%的平均错误率。我们还通过两个实际用例说明了所提出方法的应用:视频流服务系统规划和Terasort集群优化。我们所有的实验都在普通笔记本电脑上运行，其执行速度比多节点高端集群上的本机执行速度快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Simulating Big Data Clusters for System Planning, Evaluation, and Optimization

With the fast development of big data technologies, IT spending on computer clusters is increasing rapidly as well. In order to minimize the cost, architects must plan big data clusters with careful evaluation of various design choices. Current capacity planning methods are mostly trial-and-error or high level estimation based. These approaches, however, are far from efficient, especially with the increasing hardware diversity and software stack complexity. In this paper, we present CSMethod, a novel cluster simulation methodology, to facilitate efficient cluster capacity planning, performance evaluation and optimization, before system provisioning. With our proposed methodology, software stacks are simulated by an abstract yet high fidelity model, Hardware activities derived from software operations are dynamically mapped onto architecture models for processors, memory, storage and networking devices. This hardware/software hybrid methodology allows low overhead, fast and accurate cluster simulation that can be easily carried out on a standard client platform (desktop or laptop). Our experimental results with six popular Hadoop workloads demonstrate that CSMethod can achieve an average error rate of less than six percent, across various software parameters and cluster hardware configurations. We also illustrate the application of the proposed methodology with two real-world use cases: Video-streaming service system planning and Terasort cluster optimization. All our experiments are run on a commodity laptop with execution speeds faster than native executions on a multi-node high-end cluster.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 43rd International Conference on Parallel Processing

自引率

0.00%

发文量