{"title":"Optimizing the storage of massive electronic pedigrees in HDFS","authors":"Yin Zhang, Weili Han, Wei Wang, Chang Lei","doi":"10.1109/IOT.2012.6402306","DOIUrl":null,"url":null,"abstract":"Benefiting from trustworthily tracking of the processes in the production, processing, storage, transportation and sale phases, an electronic pedigree system becomes an important technology of the Internet of Things. In an electronic pedigree system, small-sized but huge volume of electronic pedigrees in the XML format will be generated, stored, and retrieved. Unfortunately, study of these massive electronic pedigrees' storage in an electronic pedigree system, which is in the form of small XML files, is rarely concerned. We, therefore, try to leverage Hadoop to solve the storage problem of massive electronic pedigrees, by the optimization of storing and accessing massive small XML files in HDFS. First, all correlated small XML files of the same envelope are merged into a larger file to reduce the metadata occupation at NameNode. Second, a prefetching mechanism and a remerging mechanism are used to improve the efficiency of accessing small XML files. Finally, we implement a prototype to evaluate the effectiveness and efficiency comparing with the origin HDFS. The results show that the optimized approach is able to reduce the memory consumption of NameNodes by up to 50%, improve performance of storing by up to 91%, and accelerate accessing by up to 88% in Hadoop.","PeriodicalId":142810,"journal":{"name":"2012 3rd IEEE International Conference on the Internet of Things","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 3rd IEEE International Conference on the Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOT.2012.6402306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Benefiting from trustworthily tracking of the processes in the production, processing, storage, transportation and sale phases, an electronic pedigree system becomes an important technology of the Internet of Things. In an electronic pedigree system, small-sized but huge volume of electronic pedigrees in the XML format will be generated, stored, and retrieved. Unfortunately, study of these massive electronic pedigrees' storage in an electronic pedigree system, which is in the form of small XML files, is rarely concerned. We, therefore, try to leverage Hadoop to solve the storage problem of massive electronic pedigrees, by the optimization of storing and accessing massive small XML files in HDFS. First, all correlated small XML files of the same envelope are merged into a larger file to reduce the metadata occupation at NameNode. Second, a prefetching mechanism and a remerging mechanism are used to improve the efficiency of accessing small XML files. Finally, we implement a prototype to evaluate the effectiveness and efficiency comparing with the origin HDFS. The results show that the optimized approach is able to reduce the memory consumption of NameNodes by up to 50%, improve performance of storing by up to 91%, and accelerate accessing by up to 88% in Hadoop.