{"title":"ActCap:通过能力感知的数据放置在异构集群上加速MapReduce","authors":"Bo Wang, Jinlei Jiang, Guangwen Yang","doi":"10.1109/INFOCOM.2015.7218509","DOIUrl":null,"url":null,"abstract":"As a widely used programming model and implementation for processing large data sets, MapReduce performs poorly on heterogeneous clusters, which, unfortunately, are common in current computing environments. To deal with the problem, this paper: 1) analyzes the causes of performance degradation and identifies the key one as the large volume of inter-node data transfer resulted from even data distribution among nodes of different computing capabilities, and 2) proposes ActCap, a solution that uses a Markov chain based model to do node-capability-aware data placement for the continuously incoming data. ActCap has been incorporated into Hadoop and evaluated on a 24-node heterogeneous cluster by 13 benchmarks. The experimental results show that ActCap can reduce the percentage of inter-node data transfer from 32.9% to 7.7% and gain an average speedup of 49.8% when compared with Hadoop, and achieve an average speedup of 9.8% when compared with Tarazu, the latest related work.","PeriodicalId":342583,"journal":{"name":"2015 IEEE Conference on Computer Communications (INFOCOM)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"ActCap: Accelerating MapReduce on heterogeneous clusters with capability-aware data placement\",\"authors\":\"Bo Wang, Jinlei Jiang, Guangwen Yang\",\"doi\":\"10.1109/INFOCOM.2015.7218509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a widely used programming model and implementation for processing large data sets, MapReduce performs poorly on heterogeneous clusters, which, unfortunately, are common in current computing environments. To deal with the problem, this paper: 1) analyzes the causes of performance degradation and identifies the key one as the large volume of inter-node data transfer resulted from even data distribution among nodes of different computing capabilities, and 2) proposes ActCap, a solution that uses a Markov chain based model to do node-capability-aware data placement for the continuously incoming data. ActCap has been incorporated into Hadoop and evaluated on a 24-node heterogeneous cluster by 13 benchmarks. The experimental results show that ActCap can reduce the percentage of inter-node data transfer from 32.9% to 7.7% and gain an average speedup of 49.8% when compared with Hadoop, and achieve an average speedup of 9.8% when compared with Tarazu, the latest related work.\",\"PeriodicalId\":342583,\"journal\":{\"name\":\"2015 IEEE Conference on Computer Communications (INFOCOM)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Conference on Computer Communications (INFOCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM.2015.7218509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computer Communications (INFOCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM.2015.7218509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ActCap: Accelerating MapReduce on heterogeneous clusters with capability-aware data placement
As a widely used programming model and implementation for processing large data sets, MapReduce performs poorly on heterogeneous clusters, which, unfortunately, are common in current computing environments. To deal with the problem, this paper: 1) analyzes the causes of performance degradation and identifies the key one as the large volume of inter-node data transfer resulted from even data distribution among nodes of different computing capabilities, and 2) proposes ActCap, a solution that uses a Markov chain based model to do node-capability-aware data placement for the continuously incoming data. ActCap has been incorporated into Hadoop and evaluated on a 24-node heterogeneous cluster by 13 benchmarks. The experimental results show that ActCap can reduce the percentage of inter-node data transfer from 32.9% to 7.7% and gain an average speedup of 49.8% when compared with Hadoop, and achieve an average speedup of 9.8% when compared with Tarazu, the latest related work.