地理分布式计算环境的分层Hadoop应用程序分析

Marco Cavallo, G. Modica, Carmelo Polito, O. Tomarchio
{"title":"地理分布式计算环境的分层Hadoop应用程序分析","authors":"Marco Cavallo, G. Modica, Carmelo Polito, O. Tomarchio","doi":"10.1109/ISCC.2016.7543796","DOIUrl":null,"url":null,"abstract":"In the past two decades there has been a growing interest over the definition of new distributed computational paradigms capable to serve the need of manipulating and analyzing huge amounts of data. Among the others, the MapReduce outstands for popularity. Its open-source implementation Hadoop is widely used in academic environments and is also greatly supported by huge IT players. There are many application scenarios where the data to be manipulated resides on data centers which are heterogeneous in term of computing capacity and are geographically distant from each other's. Unfortunately, in this contexts Hadoop performs very poorly. In this paper we propose to leverage on a hierarchical computing framework to boost the Hadoop performance in geo-distributed computing environments. The framework we propose drains fresh information from the distributed computing context and exploits it to carry out a smart job scheduling strategy. In this work, the focus is put on the study and definition of the application profile of the jobs. We implemented a software prototype of the proposed hierarchical Hadoop framework. Tests run on the prototype proved the capability of the job scheduling system to compute the job's execution path and estimate its completion time.","PeriodicalId":148096,"journal":{"name":"2016 IEEE Symposium on Computers and Communication (ISCC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Application profiling in hierarchical Hadoop for geo-distributed computing environments\",\"authors\":\"Marco Cavallo, G. Modica, Carmelo Polito, O. Tomarchio\",\"doi\":\"10.1109/ISCC.2016.7543796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past two decades there has been a growing interest over the definition of new distributed computational paradigms capable to serve the need of manipulating and analyzing huge amounts of data. Among the others, the MapReduce outstands for popularity. Its open-source implementation Hadoop is widely used in academic environments and is also greatly supported by huge IT players. There are many application scenarios where the data to be manipulated resides on data centers which are heterogeneous in term of computing capacity and are geographically distant from each other's. Unfortunately, in this contexts Hadoop performs very poorly. In this paper we propose to leverage on a hierarchical computing framework to boost the Hadoop performance in geo-distributed computing environments. The framework we propose drains fresh information from the distributed computing context and exploits it to carry out a smart job scheduling strategy. In this work, the focus is put on the study and definition of the application profile of the jobs. We implemented a software prototype of the proposed hierarchical Hadoop framework. Tests run on the prototype proved the capability of the job scheduling system to compute the job's execution path and estimate its completion time.\",\"PeriodicalId\":148096,\"journal\":{\"name\":\"2016 IEEE Symposium on Computers and Communication (ISCC)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Symposium on Computers and Communication (ISCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC.2016.7543796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium on Computers and Communication (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC.2016.7543796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

在过去的二十年里,人们对能够满足处理和分析海量数据需求的新型分布式计算范式的定义越来越感兴趣。其中,MapReduce最受欢迎。它的开源实现Hadoop在学术环境中广泛使用,也得到了大型IT企业的大力支持。在许多应用场景中,需要操作的数据驻留在计算能力不同且地理位置相距遥远的数据中心中。不幸的是,在这种情况下,Hadoop的表现非常糟糕。在本文中,我们建议利用层次计算框架来提高Hadoop在地理分布式计算环境中的性能。我们提出的框架从分布式计算环境中提取新信息,并利用它来实现智能作业调度策略。在本工作中,重点研究和定义了工种的应用概况。我们实现了提出的分层Hadoop框架的软件原型。在样机上运行的测试证明了作业调度系统计算作业执行路径和估计作业完成时间的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application profiling in hierarchical Hadoop for geo-distributed computing environments
In the past two decades there has been a growing interest over the definition of new distributed computational paradigms capable to serve the need of manipulating and analyzing huge amounts of data. Among the others, the MapReduce outstands for popularity. Its open-source implementation Hadoop is widely used in academic environments and is also greatly supported by huge IT players. There are many application scenarios where the data to be manipulated resides on data centers which are heterogeneous in term of computing capacity and are geographically distant from each other's. Unfortunately, in this contexts Hadoop performs very poorly. In this paper we propose to leverage on a hierarchical computing framework to boost the Hadoop performance in geo-distributed computing environments. The framework we propose drains fresh information from the distributed computing context and exploits it to carry out a smart job scheduling strategy. In this work, the focus is put on the study and definition of the application profile of the jobs. We implemented a software prototype of the proposed hierarchical Hadoop framework. Tests run on the prototype proved the capability of the job scheduling system to compute the job's execution path and estimate its completion time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信