Nobody ever got fired for using Hadoop on a cluster

HotCDP '12 Pub Date : 2012-04-10 DOI:10.1145/2169090.2169092
A. Rowstron, D. Narayanan, Austin Donnelly, G. O'Shea, Andrew Douglas
{"title":"Nobody ever got fired for using Hadoop on a cluster","authors":"A. Rowstron, D. Narayanan, Austin Donnelly, G. O'Shea, Andrew Douglas","doi":"10.1145/2169090.2169092","DOIUrl":null,"url":null,"abstract":"The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that \"nobody ever got fired for using Hadoop on a cluster\"!\n We completely agree that Hadoop on a cluster is the right solution for jobs where the input data is multi-terabyte or larger. However, in this position paper we ask if this is the right path for general purpose data analytics? Evidence suggests that many MapReduce-like jobs process relatively small input data sets (less than 14 GB). Memory has reached a GB/$ ratio such that it is now technically and financially feasible to have servers with 100s GB of DRAM. We therefore ask, should we be scaling by using single machines with very large memories rather than clusters? We conjecture that, in terms of hardware and programmer time, this may be a better option for the majority of data processing jobs.","PeriodicalId":183902,"journal":{"name":"HotCDP '12","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"84","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HotCDP '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2169090.2169092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 84

Abstract

The norm for data analytics is now to run them on commodity clusters with MapReduce-like abstractions. One only needs to read the popular blogs to see the evidence of this. We believe that we could now say that "nobody ever got fired for using Hadoop on a cluster"! We completely agree that Hadoop on a cluster is the right solution for jobs where the input data is multi-terabyte or larger. However, in this position paper we ask if this is the right path for general purpose data analytics? Evidence suggests that many MapReduce-like jobs process relatively small input data sets (less than 14 GB). Memory has reached a GB/$ ratio such that it is now technically and financially feasible to have servers with 100s GB of DRAM. We therefore ask, should we be scaling by using single machines with very large memories rather than clusters? We conjecture that, in terms of hardware and programmer time, this may be a better option for the majority of data processing jobs.
没有人因为在集群上使用Hadoop而被解雇
现在,数据分析的规范是在具有类似mapreduce的抽象的商品集群上运行它们。人们只需要阅读流行的博客就能看到这方面的证据。我们相信我们现在可以说“没有人会因为在集群上使用Hadoop而被解雇”!我们完全同意,对于输入数据为数tb或更大的作业,集群上的Hadoop是正确的解决方案。然而,在这份意见书中,我们要问的是,这是否是通用数据分析的正确路径?有证据表明,许多类似mapreduce的作业处理相对较小的输入数据集(小于14 GB)。内存已经达到了GB/$的比率,因此现在拥有100 GB DRAM的服务器在技术上和经济上都是可行的。因此,我们会问,我们是否应该通过使用具有非常大内存的单个机器而不是集群来进行扩展?我们推测,就硬件和程序员时间而言,对于大多数数据处理作业来说,这可能是一个更好的选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信