扭曲控制:瓶子外的基尼

Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao
{"title":"扭曲控制:瓶子外的基尼","authors":"Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao","doi":"10.1109/IPDPSW.2014.176","DOIUrl":null,"url":null,"abstract":"In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SkewControl: Gini Out of the Bottle\",\"authors\":\"Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao\",\"doi\":\"10.1109/IPDPSW.2014.176\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.\",\"PeriodicalId\":153864,\"journal\":{\"name\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2014.176\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在大数据时代,MapReduce在极端规模的数据处理系统中扮演着重要的角色。在所有的热点问题中,数据倾斜对MapReduce系统的性能影响很大。在传统方法中,研究人员试图让用户解决需要用户拥有应用相关领域知识的问题。其他方法自动解决问题,但以开环的方式,缺乏足够的适应不同的应用程序。为了很好地解决这些问题,我们进行了跟踪驱动的实证研究,并表明歪斜具有很强的稳定性和可预测性,这使我们能够设计一个闭环自动机制,用于任务划分和调度,称为SkewControl。我们在Hadoop 1.0.4生产系统之上实现SkewControl。实验结果表明,与目前最先进的LATE和SkewTune系统相比,SkewControl系统的响应时间分别提高了23.8%和17%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SkewControl: Gini Out of the Bottle
In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信