{"title":"动态减少hadoop工作负载的任务调整","authors":"Vaggelis Antypas, Nikos Zacheilas, V. Kalogeraki","doi":"10.1145/2801948.2801953","DOIUrl":null,"url":null,"abstract":"In recent years, we observe an increasing demand for systems that are capable of efficiently managing and processing huge amounts of data. Apache's Hadoop, an open-source implementation of Google's MapReduce programming model, has emerged as one of the most popular systems for Big Data processing and is supported by major companies like Facebook, Yahoo! and Amazon. One of the most challenging aspects of executing a Hadoop job, is to configure appropriately the number of reduce tasks. The problem is exacerbated when multiple jobs are executing concurrently competing for the available system resources. Our approach consists of the following components: (i) an algorithm for computing the appropriate number of reduce tasks per job, (ii) the usage of profiler-jobs for gathering information necessary for the reduce task computation and (iii) two different policies for fragmenting the reduce tasks to the available system resources when multiple jobs execute concurrently in the cluster. Our detailed experimental evaluation using traffic monitoring Hadoop jobs on our local cluster, illustrates that our approach is practical and exhibits solid performance.","PeriodicalId":305252,"journal":{"name":"Proceedings of the 19th Panhellenic Conference on Informatics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Dynamic reduce task adjustment for hadoop workloads\",\"authors\":\"Vaggelis Antypas, Nikos Zacheilas, V. Kalogeraki\",\"doi\":\"10.1145/2801948.2801953\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, we observe an increasing demand for systems that are capable of efficiently managing and processing huge amounts of data. Apache's Hadoop, an open-source implementation of Google's MapReduce programming model, has emerged as one of the most popular systems for Big Data processing and is supported by major companies like Facebook, Yahoo! and Amazon. One of the most challenging aspects of executing a Hadoop job, is to configure appropriately the number of reduce tasks. The problem is exacerbated when multiple jobs are executing concurrently competing for the available system resources. Our approach consists of the following components: (i) an algorithm for computing the appropriate number of reduce tasks per job, (ii) the usage of profiler-jobs for gathering information necessary for the reduce task computation and (iii) two different policies for fragmenting the reduce tasks to the available system resources when multiple jobs execute concurrently in the cluster. Our detailed experimental evaluation using traffic monitoring Hadoop jobs on our local cluster, illustrates that our approach is practical and exhibits solid performance.\",\"PeriodicalId\":305252,\"journal\":{\"name\":\"Proceedings of the 19th Panhellenic Conference on Informatics\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th Panhellenic Conference on Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2801948.2801953\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th Panhellenic Conference on Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2801948.2801953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic reduce task adjustment for hadoop workloads
In recent years, we observe an increasing demand for systems that are capable of efficiently managing and processing huge amounts of data. Apache's Hadoop, an open-source implementation of Google's MapReduce programming model, has emerged as one of the most popular systems for Big Data processing and is supported by major companies like Facebook, Yahoo! and Amazon. One of the most challenging aspects of executing a Hadoop job, is to configure appropriately the number of reduce tasks. The problem is exacerbated when multiple jobs are executing concurrently competing for the available system resources. Our approach consists of the following components: (i) an algorithm for computing the appropriate number of reduce tasks per job, (ii) the usage of profiler-jobs for gathering information necessary for the reduce task computation and (iii) two different policies for fragmenting the reduce tasks to the available system resources when multiple jobs execute concurrently in the cluster. Our detailed experimental evaluation using traffic monitoring Hadoop jobs on our local cluster, illustrates that our approach is practical and exhibits solid performance.