{"title":"Achieving Performance and Programmability for MapReduce(-Like) Frameworks","authors":"Jiayang Guo, G. Agrawal","doi":"10.1109/HiPC.2018.00043","DOIUrl":null,"url":null,"abstract":"Programmability and performance are often considered alternatives in the context of HPC programming systems. For example, general purpose frameworks like MPI are associated with high performance, and though MapReduce and similar frameworks have demonstrated high programmability, it is also well accepted that they fall short in terms of performance. Providing abstractions that maintain high programmability and performance remains an open question. In this paper, we introduce two different variations of the original MapReduce API, We demonstrate efficient implementations of the three APIs, focusing on how the API differences impact middleware implementation, and examine the resulting performance. Furthermore, to understand how application characteristics impact relative performance of the three systems, we develop and validate a performance model. Overall, we show that a MapReduce-like AP that only requires small additional effort from programmers can provide high performance, outperforming Spark significantly.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"28 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2018.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Programmability and performance are often considered alternatives in the context of HPC programming systems. For example, general purpose frameworks like MPI are associated with high performance, and though MapReduce and similar frameworks have demonstrated high programmability, it is also well accepted that they fall short in terms of performance. Providing abstractions that maintain high programmability and performance remains an open question. In this paper, we introduce two different variations of the original MapReduce API, We demonstrate efficient implementations of the three APIs, focusing on how the API differences impact middleware implementation, and examine the resulting performance. Furthermore, to understand how application characteristics impact relative performance of the three systems, we develop and validate a performance model. Overall, we show that a MapReduce-like AP that only requires small additional effort from programmers can provide high performance, outperforming Spark significantly.