{"title":"通过数据压缩提高Hadoop MapReduce性能:使用wordcount job的研究","authors":"Kritwara Rattanaopas, S. Kaewkeeree","doi":"10.1109/ECTICON.2017.8096300","DOIUrl":null,"url":null,"abstract":"Hadoop cluster is widely used for executing and analyzing a large data like big data. It has MapReduce engine for distributing data to each node in cluster. Compression is a benefit way of Hadoop cluster because it not only can increase space of storage but also improve performance to compute job. Recently, there are some popular Hadoop's compression codecs for example; deflate, gzip, bzip2 and snappy. An over-all compression in MapReduce, Hadoop uses a compressed input file which is gzip and bzip2. This research goal is to improve a computing performance of wordcount job using a different Hadoop compression option. We have 2 scenarios had been test in a study as follows: Scenario I, we use data compression with map output, results found the better execution-time with only snappy and deflate in a raw-text input file. It refers to compression of map output which cans not improve a computing performance than uncompressed. Scenario II, we use a compressed input file with bzip2 with the uncompressed MapReduce that results find a similar execution-time between raw-text and bzip2. It refers to a bzip2 input file can reduce a disk space and keep a computing performance. In concluding, Hadoop compression can investigate the wordcount MapReduce execution-time with a bzip2 input file in Hadoop cluster.","PeriodicalId":273911,"journal":{"name":"2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Improving Hadoop MapReduce performance with data compression: A study using wordcount job\",\"authors\":\"Kritwara Rattanaopas, S. Kaewkeeree\",\"doi\":\"10.1109/ECTICON.2017.8096300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop cluster is widely used for executing and analyzing a large data like big data. It has MapReduce engine for distributing data to each node in cluster. Compression is a benefit way of Hadoop cluster because it not only can increase space of storage but also improve performance to compute job. Recently, there are some popular Hadoop's compression codecs for example; deflate, gzip, bzip2 and snappy. An over-all compression in MapReduce, Hadoop uses a compressed input file which is gzip and bzip2. This research goal is to improve a computing performance of wordcount job using a different Hadoop compression option. We have 2 scenarios had been test in a study as follows: Scenario I, we use data compression with map output, results found the better execution-time with only snappy and deflate in a raw-text input file. It refers to compression of map output which cans not improve a computing performance than uncompressed. Scenario II, we use a compressed input file with bzip2 with the uncompressed MapReduce that results find a similar execution-time between raw-text and bzip2. It refers to a bzip2 input file can reduce a disk space and keep a computing performance. In concluding, Hadoop compression can investigate the wordcount MapReduce execution-time with a bzip2 input file in Hadoop cluster.\",\"PeriodicalId\":273911,\"journal\":{\"name\":\"2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECTICON.2017.8096300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECTICON.2017.8096300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Hadoop MapReduce performance with data compression: A study using wordcount job
Hadoop cluster is widely used for executing and analyzing a large data like big data. It has MapReduce engine for distributing data to each node in cluster. Compression is a benefit way of Hadoop cluster because it not only can increase space of storage but also improve performance to compute job. Recently, there are some popular Hadoop's compression codecs for example; deflate, gzip, bzip2 and snappy. An over-all compression in MapReduce, Hadoop uses a compressed input file which is gzip and bzip2. This research goal is to improve a computing performance of wordcount job using a different Hadoop compression option. We have 2 scenarios had been test in a study as follows: Scenario I, we use data compression with map output, results found the better execution-time with only snappy and deflate in a raw-text input file. It refers to compression of map output which cans not improve a computing performance than uncompressed. Scenario II, we use a compressed input file with bzip2 with the uncompressed MapReduce that results find a similar execution-time between raw-text and bzip2. It refers to a bzip2 input file can reduce a disk space and keep a computing performance. In concluding, Hadoop compression can investigate the wordcount MapReduce execution-time with a bzip2 input file in Hadoop cluster.