Ran Zheng, Genmao Yu, Hai Jin, Xuanhua Shi, Qin Zhang
{"title":"Conch:迭代应用的循环MapReduce模型","authors":"Ran Zheng, Genmao Yu, Hai Jin, Xuanhua Shi, Qin Zhang","doi":"10.1109/PDP.2016.66","DOIUrl":null,"url":null,"abstract":"MapReduce programming model is a popular model to simplify but speed up data parallel applications. However, it is not efficient for iterative applications because of its repeated data transmission with HDFS (Hadoop Distributed File System). Conch, a cyclic MapReduce model, is designed for efficient processing of iterative applications. In order to minimize network overhead, shared data is cached locally and a \"map-shuffle\" phase is presented with a combined transmission mechanism. Meanwhile, a prediction scheduler for iterative applications is brought out to achieve better data locality in terms of runtime information. The experiments show that Conch can support iterative applications transparently and efficiently. Compared with Hadoop and HaLoop in single-job environment, Conch can achieve 13%-17% improvements on K-Means and fuzzy C-Means. Especially in multi-job environment, 63.6% and 28.6% improvements can be obtained compared with Hadoop and HaLoop.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Conch: A Cyclic MapReduce Model for Iterative Applications\",\"authors\":\"Ran Zheng, Genmao Yu, Hai Jin, Xuanhua Shi, Qin Zhang\",\"doi\":\"10.1109/PDP.2016.66\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce programming model is a popular model to simplify but speed up data parallel applications. However, it is not efficient for iterative applications because of its repeated data transmission with HDFS (Hadoop Distributed File System). Conch, a cyclic MapReduce model, is designed for efficient processing of iterative applications. In order to minimize network overhead, shared data is cached locally and a \\\"map-shuffle\\\" phase is presented with a combined transmission mechanism. Meanwhile, a prediction scheduler for iterative applications is brought out to achieve better data locality in terms of runtime information. The experiments show that Conch can support iterative applications transparently and efficiently. Compared with Hadoop and HaLoop in single-job environment, Conch can achieve 13%-17% improvements on K-Means and fuzzy C-Means. Especially in multi-job environment, 63.6% and 28.6% improvements can be obtained compared with Hadoop and HaLoop.\",\"PeriodicalId\":192273,\"journal\":{\"name\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2016.66\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Conch: A Cyclic MapReduce Model for Iterative Applications
MapReduce programming model is a popular model to simplify but speed up data parallel applications. However, it is not efficient for iterative applications because of its repeated data transmission with HDFS (Hadoop Distributed File System). Conch, a cyclic MapReduce model, is designed for efficient processing of iterative applications. In order to minimize network overhead, shared data is cached locally and a "map-shuffle" phase is presented with a combined transmission mechanism. Meanwhile, a prediction scheduler for iterative applications is brought out to achieve better data locality in terms of runtime information. The experiments show that Conch can support iterative applications transparently and efficiently. Compared with Hadoop and HaLoop in single-job environment, Conch can achieve 13%-17% improvements on K-Means and fuzzy C-Means. Especially in multi-job environment, 63.6% and 28.6% improvements can be obtained compared with Hadoop and HaLoop.