Correlation Based File Prefetching Approach for Hadoop

B. Dong, Xiao Zhong, Q. Zheng, L. Jian, Jian Liu, J. Qiu, Ying Li
{"title":"Correlation Based File Prefetching Approach for Hadoop","authors":"B. Dong, Xiao Zhong, Q. Zheng, L. Jian, Jian Liu, J. Qiu, Ying Li","doi":"10.1109/CloudCom.2010.60","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System (HDFS) has been widely adopted to support Internet applications because of its reliable, scalable and low-cost storage capability. Blue Sky, one of the most popular e-Learning resource sharing systems in China, is utilizing HDFS to store massive courseware. However, due to the inefficient access mechanism of HDFS, access latency of reading files from HDFS significantly impacts the performance of processing user requests. This paper introduces a two-level correlation based file prefetching approach, taking the characteristics of HDFS into consideration, to improve performance by reducing access latency. Four placement patterns to store prefetched data are presented, with policies to achieve trade-off between performance and efficiency of HDFS prefetching. Moreover, a dynamic replica selection algorithm is investigated to improve the efficiency of HDFS prefetching. The proposed prefetching approach has been implemented in Blue Sky, and experimental results prove that correlation based file prefetching can significantly reduce access latency therefore improve performance of Hadoop-based Internet applications.","PeriodicalId":130987,"journal":{"name":"2010 IEEE Second International Conference on Cloud Computing Technology and Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Second International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2010.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Hadoop Distributed File System (HDFS) has been widely adopted to support Internet applications because of its reliable, scalable and low-cost storage capability. Blue Sky, one of the most popular e-Learning resource sharing systems in China, is utilizing HDFS to store massive courseware. However, due to the inefficient access mechanism of HDFS, access latency of reading files from HDFS significantly impacts the performance of processing user requests. This paper introduces a two-level correlation based file prefetching approach, taking the characteristics of HDFS into consideration, to improve performance by reducing access latency. Four placement patterns to store prefetched data are presented, with policies to achieve trade-off between performance and efficiency of HDFS prefetching. Moreover, a dynamic replica selection algorithm is investigated to improve the efficiency of HDFS prefetching. The proposed prefetching approach has been implemented in Blue Sky, and experimental results prove that correlation based file prefetching can significantly reduce access latency therefore improve performance of Hadoop-based Internet applications.
基于关联的Hadoop文件预取方法
HDFS (Hadoop Distributed File System)以其可靠、可扩展和低成本的存储能力被广泛应用于互联网应用。Blue Sky是中国最受欢迎的电子学习资源共享系统之一,它利用HDFS存储海量课件。但是,由于HDFS的访问机制效率不高,读取文件的访问延迟会严重影响处理用户请求的性能。本文介绍了一种基于两级关联的文件预取方法,考虑到HDFS的特点,通过减少访问延迟来提高性能。提出了四种存储预取数据的放置模式,并提出了在HDFS预取性能和效率之间进行权衡的策略。此外,为了提高HDFS预取的效率,研究了一种动态副本选择算法。本文提出的预取方法已在Blue Sky中实现,实验结果证明,基于相关性的文件预取可以显著降低访问延迟,从而提高基于hadoop的互联网应用的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信