{"title":"SequenceFile Storage Optimization Method Based on Pile Structure","authors":"Wenjing Wu, Huiyi Liu, Liting Duan, Si-Yuan Xu","doi":"10.1109/ICAICA52286.2021.9498062","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System (HDFS) performs well when storing and managing large data sets, but its performance is significantly reduced when processing massive small files. In response to this problem, a SequenceFile file storage optimization method based on the pile structure (OPSS) is proposed. The method uses the pile as a unit to merge and merge small files in the pile into SequenceFile based on the worst-fit strategy, reducing the data blocks consumed by the same number of small files. In addition, this method stores the index information of small files in a global index file, and accesses the small files through the global index file, which improves the access efficiency and reduces the memory occupied by the index file. Experimental results show that this method effectively optimizes the performance of HDFS for accessing large amounts of small files.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9498062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Hadoop Distributed File System (HDFS) performs well when storing and managing large data sets, but its performance is significantly reduced when processing massive small files. In response to this problem, a SequenceFile file storage optimization method based on the pile structure (OPSS) is proposed. The method uses the pile as a unit to merge and merge small files in the pile into SequenceFile based on the worst-fit strategy, reducing the data blocks consumed by the same number of small files. In addition, this method stores the index information of small files in a global index file, and accesses the small files through the global index file, which improves the access efficiency and reduces the memory occupied by the index file. Experimental results show that this method effectively optimizes the performance of HDFS for accessing large amounts of small files.