一种处理HDFS小文件问题的新技术:基于哈希的归档文件(HBAF)

Recent Trends in Intensive Computing Pub Date : 2021-12-01 DOI:10.3233/apc210205

Vijay Shankar Sharma, N. Barwar

{"title":"一种处理HDFS小文件问题的新技术:基于哈希的归档文件(HBAF)","authors":"Vijay Shankar Sharma, N. Barwar","doi":"10.3233/apc210205","DOIUrl":null,"url":null,"abstract":"Now a day’s, Data is exponentially increasing with the advancement in the data science. Each and every digital footprint is generating enormous amount of data, which is further used for processing various tasks to generate important information for different end user applications. To handle such enormous amount of data, there are number of technologies available, Hadoop/HDFS is one of the big data handling technology. HDFS can easily handle the large files but when there is the case to deal with massive number of small files, the performance of the HDFS degrades. In this paper we have proposed a novel technique Hash Based Archive File (HBAF) that can solve the small file problem of the HDFS. The proposed technique is capable to read the final index files partly, that will reduce the memory load on the Name Node and offer the file appending capability after creation of the archiv.","PeriodicalId":429440,"journal":{"name":"Recent Trends in Intensive Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Novel Technique for Handling Small File Problem of HDFS: Hash Based Archive File (HBAF)\",\"authors\":\"Vijay Shankar Sharma, N. Barwar\",\"doi\":\"10.3233/apc210205\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Now a day’s, Data is exponentially increasing with the advancement in the data science. Each and every digital footprint is generating enormous amount of data, which is further used for processing various tasks to generate important information for different end user applications. To handle such enormous amount of data, there are number of technologies available, Hadoop/HDFS is one of the big data handling technology. HDFS can easily handle the large files but when there is the case to deal with massive number of small files, the performance of the HDFS degrades. In this paper we have proposed a novel technique Hash Based Archive File (HBAF) that can solve the small file problem of the HDFS. The proposed technique is capable to read the final index files partly, that will reduce the memory load on the Name Node and offer the file appending capability after creation of the archiv.\",\"PeriodicalId\":429440,\"journal\":{\"name\":\"Recent Trends in Intensive Computing\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Trends in Intensive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/apc210205\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Trends in Intensive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/apc210205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

如今，随着数据科学的进步，数据呈指数级增长。每一个数字足迹都在产生大量的数据，这些数据被进一步用于处理各种任务，为不同的最终用户应用程序生成重要信息。为了处理如此庞大的数据量，有许多技术可用，Hadoop/HDFS是大数据处理技术之一。HDFS可以很容易地处理大文件，但是当需要处理大量小文件时，HDFS的性能就会下降。本文提出了一种新的基于Hash的归档文件(HBAF)技术，可以解决HDFS的小文件问题。所建议的技术能够部分读取最终索引文件，这将减少Name Node上的内存负载，并在创建归档后提供文件追加功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Technique for Handling Small File Problem of HDFS: Hash Based Archive File (HBAF)

Now a day’s, Data is exponentially increasing with the advancement in the data science. Each and every digital footprint is generating enormous amount of data, which is further used for processing various tasks to generate important information for different end user applications. To handle such enormous amount of data, there are number of technologies available, Hadoop/HDFS is one of the big data handling technology. HDFS can easily handle the large files but when there is the case to deal with massive number of small files, the performance of the HDFS degrades. In this paper we have proposed a novel technique Hash Based Archive File (HBAF) that can solve the small file problem of the HDFS. The proposed technique is capable to read the final index files partly, that will reduce the memory load on the Name Node and offer the file appending capability after creation of the archiv.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Recent Trends in Intensive Computing

自引率

0.00%

发文量