Optimizing Hadoop Framework for Solid State Drives

Jae-Ki Hong, Liang Li, Chihye Han, Bingxu Jin, Qichao Yang, Zilong Yang
{"title":"Optimizing Hadoop Framework for Solid State Drives","authors":"Jae-Ki Hong, Liang Li, Chihye Han, Bingxu Jin, Qichao Yang, Zilong Yang","doi":"10.1109/BigDataCongress.2016.11","DOIUrl":null,"url":null,"abstract":"Solid state drives (SSDs) have been widely used in Hadoop clusters ever since their introduction to the big data industry. However, the current Hadoop framework is not optimized to take full advantage of SSDs. In this paper, we introduce architectural improvements in the core Hadoop components to fully exploit the performance benefits of SSDs for data-and compute-intensive workloads. The improved architecture features: a simplified data handling algorithm that utilizes SSD's high random IOPS to store and shuffle the map output data, an accurate pre-read model for HDFS based on libaio to reduce read latency and improve request parallelism, a record size based reduce scheduler to overcome the data skew problem in the reduce phase, and a new block placement policy of HDFS based on the disk wear information to manage SSDs' lifetime. The simplified map output collector and the pre-read model of HDFS show 30% and 18% of performance improvement with Terasort and DFSIO benchmarks, respectively. The modified reduce scheduler shows 12% faster execution time with a real MapReduce application. To extend these results, we affirm that the modified structure also achieves 21% performance improvement on Samsung's MicroBrick-based hyperscale system.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Solid state drives (SSDs) have been widely used in Hadoop clusters ever since their introduction to the big data industry. However, the current Hadoop framework is not optimized to take full advantage of SSDs. In this paper, we introduce architectural improvements in the core Hadoop components to fully exploit the performance benefits of SSDs for data-and compute-intensive workloads. The improved architecture features: a simplified data handling algorithm that utilizes SSD's high random IOPS to store and shuffle the map output data, an accurate pre-read model for HDFS based on libaio to reduce read latency and improve request parallelism, a record size based reduce scheduler to overcome the data skew problem in the reduce phase, and a new block placement policy of HDFS based on the disk wear information to manage SSDs' lifetime. The simplified map output collector and the pre-read model of HDFS show 30% and 18% of performance improvement with Terasort and DFSIO benchmarks, respectively. The modified reduce scheduler shows 12% faster execution time with a real MapReduce application. To extend these results, we affirm that the modified structure also achieves 21% performance improvement on Samsung's MicroBrick-based hyperscale system.
面向固态硬盘的Hadoop框架优化
自从固态硬盘(ssd)被引入大数据行业以来,它就被广泛应用于Hadoop集群中。然而,目前的Hadoop框架并没有优化到充分利用ssd。在本文中,我们介绍了核心Hadoop组件的架构改进,以充分利用ssd在数据和计算密集型工作负载方面的性能优势。改进后的架构特点:简化的数据处理算法,利用SSD的高随机IOPS存储和shuffle map输出数据;基于libaio的HDFS精确预读模型,减少读取延迟,提高请求并行性;基于记录大小的reduce调度器,克服reduce阶段的数据倾斜问题;基于磁盘磨损信息的HDFS新的块放置策略,管理SSD的生命周期。简化的映射输出收集器和HDFS的预读模型在Terasort和DFSIO基准测试中分别显示了30%和18%的性能提升。修改后的reduce调度器显示,在真正的MapReduce应用程序中,执行时间快了12%。为了扩展这些结果,我们确认修改后的结构在三星基于microbrick的超大规模系统上也实现了21%的性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信