了x10 MapReduce

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI:10.1145/2020373.2020382

H. Dong, Shujia Zhou, D. Grove

{"title":"了x10 MapReduce","authors":"H. Dong, Shujia Zhou, D. Grove","doi":"10.1145/2020373.2020382","DOIUrl":null,"url":null,"abstract":"The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes [1]. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java [5]. However, these implementations either rely on third-party file systems for across-computer-node communication or are difficult to implement with socket programming or communication libraries such as MPI. To address these challenges, we investigated utilizing the X10 language to implement MapReduce and tested it with the word-count use case. The key performance factor in implementing MapReduce is data moving across different computer nodes. Since X10 has built-in functions for across-node communication such as distributed arrays [2], a major challenge with MapReduce implementations is easily solved. We tested two main implementations: the first utilizes the HashMap data structure and the second a Rail with elements consisting of a string and integer pair. The performance of these two implementations are analyzed and discussed.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"X10-enabled MapReduce\",\"authors\":\"H. Dong, Shujia Zhou, D. Grove\",\"doi\":\"10.1145/2020373.2020382\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes [1]. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java [5]. However, these implementations either rely on third-party file systems for across-computer-node communication or are difficult to implement with socket programming or communication libraries such as MPI. To address these challenges, we investigated utilizing the X10 language to implement MapReduce and tested it with the word-count use case. The key performance factor in implementing MapReduce is data moving across different computer nodes. Since X10 has built-in functions for across-node communication such as distributed arrays [2], a major challenge with MapReduce implementations is easily solved. We tested two main implementations: the first utilizes the HashMap data structure and the second a Rail with elements consisting of a string and integer pair. The performance of these two implementations are analyzed and discussed.\",\"PeriodicalId\":245693,\"journal\":{\"name\":\"International Conference on Partitioned Global Address Space Programming Models\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Partitioned Global Address Space Programming Models\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2020373.2020382\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Partitioned Global Address Space Programming Models","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2020373.2020382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

MapReduce框架已经成为在计算节点集群上并行处理大型数据集的流行且强大的工具[1]。目前，MapReduce的实现方式有很多种，其中最流行的是Java中的Hadoop实现[5]。然而，这些实现要么依赖第三方文件系统进行跨计算机节点通信，要么难以用套接字编程或MPI等通信库实现。为了解决这些挑战，我们研究了使用X10语言来实现MapReduce，并使用单词计数用例对其进行了测试。实现MapReduce的关键性能因素是数据在不同计算机节点之间的移动。由于X10具有内置的跨节点通信功能，如分布式数组[2]，因此MapReduce实现的一个主要挑战很容易解决。我们测试了两种主要的实现:第一个使用HashMap数据结构，第二个使用包含字符串和整数对的元素的Rail。对这两种实现的性能进行了分析和讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

X10-enabled MapReduce

The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes [1]. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java [5]. However, these implementations either rely on third-party file systems for across-computer-node communication or are difficult to implement with socket programming or communication libraries such as MPI. To address these challenges, we investigated utilizing the X10 language to implement MapReduce and tested it with the word-count use case. The key performance factor in implementing MapReduce is data moving across different computer nodes. Since X10 has built-in functions for across-node communication such as distributed arrays [2], a major challenge with MapReduce implementations is easily solved. We tested two main implementations: the first utilizes the HashMap data structure and the second a Rail with elements consisting of a string and integer pair. The performance of these two implementations are analyzed and discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Partitioned Global Address Space Programming Models

自引率

0.00%

发文量