{"title":"基于动态队列的hdfs小文件存储优化方法","authors":"Weipeng Jing, Danyu Tong","doi":"10.1109/IIKI.2016.55","DOIUrl":null,"url":null,"abstract":"Under the background of the rapid development of social network, massive small files data are urgently needed to be dealt effectively. Unfortunately, HDFS (Hadoop Distributed File System) does not perform well for massive small files since the heavy burden on NameNode, and poor reading performance. Therefore, in order to solve this problem, a method DQSF (Dynamic Queue of Small File) is proposed in this paper. It designs an appropriate queue for files of different sizes, which are as the basis for merge of small files. The method based on Analytic Hierarchy Process. It obtains the size of queue under the best performance when computing system's comprehensive index of the file reading, memory usage and merging efficiency. Which means the dynamic queue value under the corresponding range. Also using the text categorization algorithm based on period feature at the prior to the merger of small files. It improves the speed of file reading through the hierarchical index and index prefetching mechanism at the same time. Experimental results show that this strategy can reduce the usage of memory, improve the efficiency of accessing massive small files, resulting the great improvement of system's performance.","PeriodicalId":371106,"journal":{"name":"2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Optimized Approach for Storing Small Files on HDFS-based on Dynamic Queue\",\"authors\":\"Weipeng Jing, Danyu Tong\",\"doi\":\"10.1109/IIKI.2016.55\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Under the background of the rapid development of social network, massive small files data are urgently needed to be dealt effectively. Unfortunately, HDFS (Hadoop Distributed File System) does not perform well for massive small files since the heavy burden on NameNode, and poor reading performance. Therefore, in order to solve this problem, a method DQSF (Dynamic Queue of Small File) is proposed in this paper. It designs an appropriate queue for files of different sizes, which are as the basis for merge of small files. The method based on Analytic Hierarchy Process. It obtains the size of queue under the best performance when computing system's comprehensive index of the file reading, memory usage and merging efficiency. Which means the dynamic queue value under the corresponding range. Also using the text categorization algorithm based on period feature at the prior to the merger of small files. It improves the speed of file reading through the hierarchical index and index prefetching mechanism at the same time. Experimental results show that this strategy can reduce the usage of memory, improve the efficiency of accessing massive small files, resulting the great improvement of system's performance.\",\"PeriodicalId\":371106,\"journal\":{\"name\":\"2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIKI.2016.55\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIKI.2016.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Optimized Approach for Storing Small Files on HDFS-based on Dynamic Queue
Under the background of the rapid development of social network, massive small files data are urgently needed to be dealt effectively. Unfortunately, HDFS (Hadoop Distributed File System) does not perform well for massive small files since the heavy burden on NameNode, and poor reading performance. Therefore, in order to solve this problem, a method DQSF (Dynamic Queue of Small File) is proposed in this paper. It designs an appropriate queue for files of different sizes, which are as the basis for merge of small files. The method based on Analytic Hierarchy Process. It obtains the size of queue under the best performance when computing system's comprehensive index of the file reading, memory usage and merging efficiency. Which means the dynamic queue value under the corresponding range. Also using the text categorization algorithm based on period feature at the prior to the merger of small files. It improves the speed of file reading through the hierarchical index and index prefetching mechanism at the same time. Experimental results show that this strategy can reduce the usage of memory, improve the efficiency of accessing massive small files, resulting the great improvement of system's performance.