A Faster Read and Less Storage Algorithm for Small Files on Hadoop

2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI) Pub Date : 2021-08-01 DOI:10.1109/ICCEAI52939.2021.00040

Yu Chen, Jun Zhang, Zhicheng Wang, Gejian Liao, Shu Liu, Hai Tan, Guowei Yang, Ying Fang, Shuai Wang, Zhaoqun Sun

引用次数: 1

Abstract

Massive small files access is the main challenge for the Hadoop Distributed File System. To solve these problems, we present a new Algorithm of archive file, A Faster Read and Less Storage Algorithm for Small Files on Hadoop. A new logical file name is used to identify the file which generated by the pair in the name node. Our experiments show that the algorithm is around 76.6% faster than original HDFS in the time of file storing, and around 31.9.6% faster than original HDFS in the time of file reading, around 73.9% less than original HDFS in the memory consumption of namenode.

查看原文本刊更多论文

一种基于Hadoop的小文件快速读少存储算法

海量小文件的访问是Hadoop分布式文件系统面临的主要挑战。为了解决这些问题，我们提出了一种新的归档文件算法——Hadoop上小文件的快速读取和更少存储算法。新的逻辑文件名用于在name节点中标识pair生成的文件。我们的实验表明，该算法在文件存储时间上比原HDFS快约76.6%，在文件读取时间上比原HDFS快约31.9.6%，在namenode的内存消耗上比原HDFS少约73.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)

自引率

0.00%

发文量