{"title":"SFS: Hadoop中的一个大型小文件处理中间件","authors":"Yonghua Huo, Zhihao Wang, XiaoXiao Zeng, Yang Yang, Wenjing Li, Cheng Zhong","doi":"10.1109/APNOMS.2016.7737234","DOIUrl":null,"url":null,"abstract":"HDFS is designed for storing large files, but it suffered performance penalty when storing large amount of small files such as the space occupied by the metadata cause high consumption of NameNode and low efficiency of file reading. Currently, there are many approaches implemented to solve the small file problem. In this paper we use additional hardware named SFS (Small File Server) between users and HDFS to solve the small file problem. The proposed approach includes a file merging algorithm based on temporal continuity, an index structure to retrieve small files and a prefetching mechanism to improve the performance of file reading and writing. The experimental results show that the proposed approach efficiently optimizes small files storing in HDFS with reducing the overload of NameNode and improving the performance of file accessing.","PeriodicalId":194123,"journal":{"name":"2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"SFS: A massive small file processing middleware in Hadoop\",\"authors\":\"Yonghua Huo, Zhihao Wang, XiaoXiao Zeng, Yang Yang, Wenjing Li, Cheng Zhong\",\"doi\":\"10.1109/APNOMS.2016.7737234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"HDFS is designed for storing large files, but it suffered performance penalty when storing large amount of small files such as the space occupied by the metadata cause high consumption of NameNode and low efficiency of file reading. Currently, there are many approaches implemented to solve the small file problem. In this paper we use additional hardware named SFS (Small File Server) between users and HDFS to solve the small file problem. The proposed approach includes a file merging algorithm based on temporal continuity, an index structure to retrieve small files and a prefetching mechanism to improve the performance of file reading and writing. The experimental results show that the proposed approach efficiently optimizes small files storing in HDFS with reducing the overload of NameNode and improving the performance of file accessing.\",\"PeriodicalId\":194123,\"journal\":{\"name\":\"2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS)\",\"volume\":\"217 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APNOMS.2016.7737234\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APNOMS.2016.7737234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SFS: A massive small file processing middleware in Hadoop
HDFS is designed for storing large files, but it suffered performance penalty when storing large amount of small files such as the space occupied by the metadata cause high consumption of NameNode and low efficiency of file reading. Currently, there are many approaches implemented to solve the small file problem. In this paper we use additional hardware named SFS (Small File Server) between users and HDFS to solve the small file problem. The proposed approach includes a file merging algorithm based on temporal continuity, an index structure to retrieve small files and a prefetching mechanism to improve the performance of file reading and writing. The experimental results show that the proposed approach efficiently optimizes small files storing in HDFS with reducing the overload of NameNode and improving the performance of file accessing.