扭曲控制:瓶子外的基尼

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.176

Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao

{"title":"扭曲控制:瓶子外的基尼","authors":"Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao","doi":"10.1109/IPDPSW.2014.176","DOIUrl":null,"url":null,"abstract":"In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SkewControl: Gini Out of the Bottle\",\"authors\":\"Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao\",\"doi\":\"10.1109/IPDPSW.2014.176\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.\",\"PeriodicalId\":153864,\"journal\":{\"name\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2014.176\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在大数据时代，MapReduce在极端规模的数据处理系统中扮演着重要的角色。在所有的热点问题中，数据倾斜对MapReduce系统的性能影响很大。在传统方法中，研究人员试图让用户解决需要用户拥有应用相关领域知识的问题。其他方法自动解决问题，但以开环的方式，缺乏足够的适应不同的应用程序。为了很好地解决这些问题，我们进行了跟踪驱动的实证研究，并表明歪斜具有很强的稳定性和可预测性，这使我们能够设计一个闭环自动机制，用于任务划分和调度，称为SkewControl。我们在Hadoop 1.0.4生产系统之上实现SkewControl。实验结果表明，与目前最先进的LATE和SkewTune系统相比，SkewControl系统的响应时间分别提高了23.8%和17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SkewControl: Gini Out of the Bottle

In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量