评估基于 R 的大数据分析框架

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.86

Mei Liang, C. Trejo, Lavanya Muthu, Linh Ngo, André Luckow, A. Apon

{"title":"评估基于 R 的大数据分析框架","authors":"Mei Liang, C. Trejo, Lavanya Muthu, Linh Ngo, André Luckow, A. Apon","doi":"10.1109/CLUSTER.2015.86","DOIUrl":null,"url":null,"abstract":"We study the two approaches, rHadoop and H2O, to intergate R, a popular statistical programming environment, into the Hadoop Big Data ecosystem. Using these approaches and the vanilla implementation of MapReduce to implement the solution to an analytic question for the on-time airline performance data set, we evaluate the differences in runtime performance and elaborate on the causes of these differences based on rHadoop and H2O's design principles.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Evaluating R-Based Big Data Analytic Frameworks\",\"authors\":\"Mei Liang, C. Trejo, Lavanya Muthu, Linh Ngo, André Luckow, A. Apon\",\"doi\":\"10.1109/CLUSTER.2015.86\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the two approaches, rHadoop and H2O, to intergate R, a popular statistical programming environment, into the Hadoop Big Data ecosystem. Using these approaches and the vanilla implementation of MapReduce to implement the solution to an analytic question for the on-time airline performance data set, we evaluate the differences in runtime performance and elaborate on the causes of these differences based on rHadoop and H2O's design principles.\",\"PeriodicalId\":187042,\"journal\":{\"name\":\"2015 IEEE International Conference on Cluster Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2015.86\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

我们研究了 rHadoop 和 H2O 这两种将 R（一种流行的统计编程环境）接入 Hadoop 大数据生态系统的方法。通过使用这些方法和 MapReduce 的 vanilla 实现，我们评估了运行时性能的差异，并根据 rHadoop 和 H2O 的设计原则阐述了造成这些差异的原因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating R-Based Big Data Analytic Frameworks

We study the two approaches, rHadoop and H2O, to intergate R, a popular statistical programming environment, into the Hadoop Big Data ecosystem. Using these approaches and the vanilla implementation of MapReduce to implement the solution to an analytic question for the on-time airline performance data set, we evaluate the differences in runtime performance and elaborate on the causes of these differences based on rHadoop and H2O's design principles.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量