Pinchao Liu, Adnan Maruf, F. Yusuf, Labiba Jahan, Hailu Xu, Boyuan Guan, Liting Hu, S. S. Iyengar
{"title":"Towards Adaptive Replication for Hot/Cold Blocks in HDFS using MemCached","authors":"Pinchao Liu, Adnan Maruf, F. Yusuf, Labiba Jahan, Hailu Xu, Boyuan Guan, Liting Hu, S. S. Iyengar","doi":"10.1109/ICDIS.2019.00035","DOIUrl":null,"url":null,"abstract":"With the advancement of ever-growing online services, distributed Big Data storage i.e. Hadoop, Dryad gained much more attention than ever and the fundamental requirements like fault tolerance and data availability become the concern for these platforms. Data replication policies in Big Data applications are shifting towards dynamic approaches based on the popularity of files. Formulation of dynamic replication factor paved the way of solving the issues generated by existing data contention in hotspots and ensuring timely data availability. But from the empirical observations, it can be deduced that popularity of files is temporal rather than perpetual in nature and, after a certain period, content's popularity ceases most of the time which introduces the I/O bottleneck of updating replication in the disk. To handle such temporal skewed popularity of contents, we propose a dynamic data replication toolset using the power of in-memory processing by integrating MemCached server into Hadoop for getting improved performance. We compare the proposed algorithm with the traditional infrastructure and vanilla memory algorithm, as the evidence from the experimental results, the proposed design performs better i.e throughput and execution period.","PeriodicalId":181673,"journal":{"name":"2019 2nd International Conference on Data Intelligence and Security (ICDIS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Data Intelligence and Security (ICDIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIS.2019.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With the advancement of ever-growing online services, distributed Big Data storage i.e. Hadoop, Dryad gained much more attention than ever and the fundamental requirements like fault tolerance and data availability become the concern for these platforms. Data replication policies in Big Data applications are shifting towards dynamic approaches based on the popularity of files. Formulation of dynamic replication factor paved the way of solving the issues generated by existing data contention in hotspots and ensuring timely data availability. But from the empirical observations, it can be deduced that popularity of files is temporal rather than perpetual in nature and, after a certain period, content's popularity ceases most of the time which introduces the I/O bottleneck of updating replication in the disk. To handle such temporal skewed popularity of contents, we propose a dynamic data replication toolset using the power of in-memory processing by integrating MemCached server into Hadoop for getting improved performance. We compare the proposed algorithm with the traditional infrastructure and vanilla memory algorithm, as the evidence from the experimental results, the proposed design performs better i.e throughput and execution period.