地理分布式计算环境的分层Hadoop应用程序分析

2016 IEEE Symposium on Computers and Communication (ISCC) Pub Date : 2016-06-27 DOI:10.1109/ISCC.2016.7543796

Marco Cavallo, G. Modica, Carmelo Polito, O. Tomarchio

{"title":"地理分布式计算环境的分层Hadoop应用程序分析","authors":"Marco Cavallo, G. Modica, Carmelo Polito, O. Tomarchio","doi":"10.1109/ISCC.2016.7543796","DOIUrl":null,"url":null,"abstract":"In the past two decades there has been a growing interest over the definition of new distributed computational paradigms capable to serve the need of manipulating and analyzing huge amounts of data. Among the others, the MapReduce outstands for popularity. Its open-source implementation Hadoop is widely used in academic environments and is also greatly supported by huge IT players. There are many application scenarios where the data to be manipulated resides on data centers which are heterogeneous in term of computing capacity and are geographically distant from each other's. Unfortunately, in this contexts Hadoop performs very poorly. In this paper we propose to leverage on a hierarchical computing framework to boost the Hadoop performance in geo-distributed computing environments. The framework we propose drains fresh information from the distributed computing context and exploits it to carry out a smart job scheduling strategy. In this work, the focus is put on the study and definition of the application profile of the jobs. We implemented a software prototype of the proposed hierarchical Hadoop framework. Tests run on the prototype proved the capability of the job scheduling system to compute the job's execution path and estimate its completion time.","PeriodicalId":148096,"journal":{"name":"2016 IEEE Symposium on Computers and Communication (ISCC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Application profiling in hierarchical Hadoop for geo-distributed computing environments\",\"authors\":\"Marco Cavallo, G. Modica, Carmelo Polito, O. Tomarchio\",\"doi\":\"10.1109/ISCC.2016.7543796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past two decades there has been a growing interest over the definition of new distributed computational paradigms capable to serve the need of manipulating and analyzing huge amounts of data. Among the others, the MapReduce outstands for popularity. Its open-source implementation Hadoop is widely used in academic environments and is also greatly supported by huge IT players. There are many application scenarios where the data to be manipulated resides on data centers which are heterogeneous in term of computing capacity and are geographically distant from each other's. Unfortunately, in this contexts Hadoop performs very poorly. In this paper we propose to leverage on a hierarchical computing framework to boost the Hadoop performance in geo-distributed computing environments. The framework we propose drains fresh information from the distributed computing context and exploits it to carry out a smart job scheduling strategy. In this work, the focus is put on the study and definition of the application profile of the jobs. We implemented a software prototype of the proposed hierarchical Hadoop framework. Tests run on the prototype proved the capability of the job scheduling system to compute the job's execution path and estimate its completion time.\",\"PeriodicalId\":148096,\"journal\":{\"name\":\"2016 IEEE Symposium on Computers and Communication (ISCC)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Symposium on Computers and Communication (ISCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC.2016.7543796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium on Computers and Communication (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC.2016.7543796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在过去的二十年里，人们对能够满足处理和分析海量数据需求的新型分布式计算范式的定义越来越感兴趣。其中，MapReduce最受欢迎。它的开源实现Hadoop在学术环境中广泛使用，也得到了大型IT企业的大力支持。在许多应用场景中，需要操作的数据驻留在计算能力不同且地理位置相距遥远的数据中心中。不幸的是，在这种情况下，Hadoop的表现非常糟糕。在本文中，我们建议利用层次计算框架来提高Hadoop在地理分布式计算环境中的性能。我们提出的框架从分布式计算环境中提取新信息，并利用它来实现智能作业调度策略。在本工作中，重点研究和定义了工种的应用概况。我们实现了提出的分层Hadoop框架的软件原型。在样机上运行的测试证明了作业调度系统计算作业执行路径和估计作业完成时间的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application profiling in hierarchical Hadoop for geo-distributed computing environments

In the past two decades there has been a growing interest over the definition of new distributed computational paradigms capable to serve the need of manipulating and analyzing huge amounts of data. Among the others, the MapReduce outstands for popularity. Its open-source implementation Hadoop is widely used in academic environments and is also greatly supported by huge IT players. There are many application scenarios where the data to be manipulated resides on data centers which are heterogeneous in term of computing capacity and are geographically distant from each other's. Unfortunately, in this contexts Hadoop performs very poorly. In this paper we propose to leverage on a hierarchical computing framework to boost the Hadoop performance in geo-distributed computing environments. The framework we propose drains fresh information from the distributed computing context and exploits it to carry out a smart job scheduling strategy. In this work, the focus is put on the study and definition of the application profile of the jobs. We implemented a software prototype of the proposed hierarchical Hadoop framework. Tests run on the prototype proved the capability of the job scheduling system to compute the job's execution path and estimate its completion time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Symposium on Computers and Communication (ISCC)

自引率

0.00%

发文量