{"title":"一种处理HDFS小文件问题的新技术:基于哈希的归档文件(HBAF)","authors":"Vijay Shankar Sharma, N. Barwar","doi":"10.3233/apc210205","DOIUrl":null,"url":null,"abstract":"Now a day’s, Data is exponentially increasing with the advancement in the data science. Each and every digital footprint is generating enormous amount of data, which is further used for processing various tasks to generate important information for different end user applications. To handle such enormous amount of data, there are number of technologies available, Hadoop/HDFS is one of the big data handling technology. HDFS can easily handle the large files but when there is the case to deal with massive number of small files, the performance of the HDFS degrades. In this paper we have proposed a novel technique Hash Based Archive File (HBAF) that can solve the small file problem of the HDFS. The proposed technique is capable to read the final index files partly, that will reduce the memory load on the Name Node and offer the file appending capability after creation of the archiv.","PeriodicalId":429440,"journal":{"name":"Recent Trends in Intensive Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Novel Technique for Handling Small File Problem of HDFS: Hash Based Archive File (HBAF)\",\"authors\":\"Vijay Shankar Sharma, N. Barwar\",\"doi\":\"10.3233/apc210205\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Now a day’s, Data is exponentially increasing with the advancement in the data science. Each and every digital footprint is generating enormous amount of data, which is further used for processing various tasks to generate important information for different end user applications. To handle such enormous amount of data, there are number of technologies available, Hadoop/HDFS is one of the big data handling technology. HDFS can easily handle the large files but when there is the case to deal with massive number of small files, the performance of the HDFS degrades. In this paper we have proposed a novel technique Hash Based Archive File (HBAF) that can solve the small file problem of the HDFS. The proposed technique is capable to read the final index files partly, that will reduce the memory load on the Name Node and offer the file appending capability after creation of the archiv.\",\"PeriodicalId\":429440,\"journal\":{\"name\":\"Recent Trends in Intensive Computing\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Trends in Intensive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/apc210205\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Trends in Intensive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/apc210205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Technique for Handling Small File Problem of HDFS: Hash Based Archive File (HBAF)
Now a day’s, Data is exponentially increasing with the advancement in the data science. Each and every digital footprint is generating enormous amount of data, which is further used for processing various tasks to generate important information for different end user applications. To handle such enormous amount of data, there are number of technologies available, Hadoop/HDFS is one of the big data handling technology. HDFS can easily handle the large files but when there is the case to deal with massive number of small files, the performance of the HDFS degrades. In this paper we have proposed a novel technique Hash Based Archive File (HBAF) that can solve the small file problem of the HDFS. The proposed technique is capable to read the final index files partly, that will reduce the memory load on the Name Node and offer the file appending capability after creation of the archiv.