一种在多云环境下实现大数据分析的中间件框架

Salman Hussain, M. Chowdhury
{"title":"一种在多云环境下实现大数据分析的中间件框架","authors":"Salman Hussain, M. Chowdhury","doi":"10.1109/ICSENG.2018.8638175","DOIUrl":null,"url":null,"abstract":"In the past two decades with the rise of Big data, cloud computing and IoT the world has seen an explosion in the way data is generated, stored and analysed. Data analytics has become the driving factor of many businesses, many of which are interlinked with one another or operate in various parts of the world. Data storage is now very much geographically distributed, and the era of dedicated datacentres is long gone. Currently, there are many scenarios where data resides on various datacentres or cloud, which have heterogenous computational capacity, network capacity and are geographically distributed. Unfortunately, the current paradigms available perform poorly in such scenarios. This has given rise to the need for a computational paradigm that is capable of analysing data over a geo graphically distributed environment.In this paper we propose using a hierarchical framework to improve the performance of Hadoop in a multi cloud or a geographically distributed environment. We have chosen Hadoop due to it being an implementation of the popular map reduce paradigm. The proposed framework considers all the heterogeneity and uses it carry out a dynamic job scheduling strategy capable enough of giving the best execution path with the least latency. The framework basically dictates the use of best possible job scheduling technique to be used in a geo distributed environment by keeping the Hadoop framework intact. The low-level computations would be taken care of by the plain Hadoop implementation, however the job scheduling and the data distribution is where the proposed framework shines.Our primary focus in this work has been to setup a software prototype of the execution environment to describe, devise and calculate factors that would be vital in predicting the best execution path and the data division methodology. We were successful in setting up a semi virtual prototype environment using 3 machines, the prototype met all the theoretical and practical benchmarks as a geo distributed multi cloud environment. Test runs were done, and the two primary factors namely computational factor and reduction factor were calculated.","PeriodicalId":356324,"journal":{"name":"2018 26th International Conference on Systems Engineering (ICSEng)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A NOVEL MIDDLEWARE FRAMEWORK FOR IMPLEMENTING BIGDATA ANALYTICS IN MULTI CLOUD ENVIRONMENT\",\"authors\":\"Salman Hussain, M. Chowdhury\",\"doi\":\"10.1109/ICSENG.2018.8638175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past two decades with the rise of Big data, cloud computing and IoT the world has seen an explosion in the way data is generated, stored and analysed. Data analytics has become the driving factor of many businesses, many of which are interlinked with one another or operate in various parts of the world. Data storage is now very much geographically distributed, and the era of dedicated datacentres is long gone. Currently, there are many scenarios where data resides on various datacentres or cloud, which have heterogenous computational capacity, network capacity and are geographically distributed. Unfortunately, the current paradigms available perform poorly in such scenarios. This has given rise to the need for a computational paradigm that is capable of analysing data over a geo graphically distributed environment.In this paper we propose using a hierarchical framework to improve the performance of Hadoop in a multi cloud or a geographically distributed environment. We have chosen Hadoop due to it being an implementation of the popular map reduce paradigm. The proposed framework considers all the heterogeneity and uses it carry out a dynamic job scheduling strategy capable enough of giving the best execution path with the least latency. The framework basically dictates the use of best possible job scheduling technique to be used in a geo distributed environment by keeping the Hadoop framework intact. The low-level computations would be taken care of by the plain Hadoop implementation, however the job scheduling and the data distribution is where the proposed framework shines.Our primary focus in this work has been to setup a software prototype of the execution environment to describe, devise and calculate factors that would be vital in predicting the best execution path and the data division methodology. We were successful in setting up a semi virtual prototype environment using 3 machines, the prototype met all the theoretical and practical benchmarks as a geo distributed multi cloud environment. Test runs were done, and the two primary factors namely computational factor and reduction factor were calculated.\",\"PeriodicalId\":356324,\"journal\":{\"name\":\"2018 26th International Conference on Systems Engineering (ICSEng)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th International Conference on Systems Engineering (ICSEng)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSENG.2018.8638175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th International Conference on Systems Engineering (ICSEng)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENG.2018.8638175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在过去的二十年里,随着大数据、云计算和物联网的兴起,世界见证了数据生成、存储和分析方式的爆炸式增长。数据分析已经成为许多企业的驱动因素,其中许多企业相互关联或在世界各地运营。数据存储现在在地理上非常分散,专用数据中心的时代早已过去。目前,存在许多数据驻留在不同数据中心或云上的场景,这些数据中心或云具有异构的计算能力、网络容量和地理分布。不幸的是,当前可用的范例在这种情况下表现不佳。这就产生了对一种能够在地理分布环境中分析数据的计算范式的需求。在本文中,我们建议使用分层框架来提高Hadoop在多云或地理分布式环境中的性能。我们选择Hadoop是因为它实现了流行的map - reduce范式。该框架考虑了所有的异构性,并利用它实现了一种动态作业调度策略,该策略能够以最小的延迟给出最佳的执行路径。该框架基本上规定了在地理分布式环境中使用最好的作业调度技术,同时保持Hadoop框架的完整性。底层的计算将由普通的Hadoop实现来处理,然而作业调度和数据分布是这个框架的亮点所在。我们在这项工作中的主要重点是建立一个执行环境的软件原型,以描述、设计和计算对于预测最佳执行路径和数据划分方法至关重要的因素。我们成功地使用3台机器建立了一个半虚拟的原型环境,原型满足了作为地理分布式多云环境的所有理论和实践基准。进行了试验运行,计算了计算因子和缩减因子两个主要因子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A NOVEL MIDDLEWARE FRAMEWORK FOR IMPLEMENTING BIGDATA ANALYTICS IN MULTI CLOUD ENVIRONMENT
In the past two decades with the rise of Big data, cloud computing and IoT the world has seen an explosion in the way data is generated, stored and analysed. Data analytics has become the driving factor of many businesses, many of which are interlinked with one another or operate in various parts of the world. Data storage is now very much geographically distributed, and the era of dedicated datacentres is long gone. Currently, there are many scenarios where data resides on various datacentres or cloud, which have heterogenous computational capacity, network capacity and are geographically distributed. Unfortunately, the current paradigms available perform poorly in such scenarios. This has given rise to the need for a computational paradigm that is capable of analysing data over a geo graphically distributed environment.In this paper we propose using a hierarchical framework to improve the performance of Hadoop in a multi cloud or a geographically distributed environment. We have chosen Hadoop due to it being an implementation of the popular map reduce paradigm. The proposed framework considers all the heterogeneity and uses it carry out a dynamic job scheduling strategy capable enough of giving the best execution path with the least latency. The framework basically dictates the use of best possible job scheduling technique to be used in a geo distributed environment by keeping the Hadoop framework intact. The low-level computations would be taken care of by the plain Hadoop implementation, however the job scheduling and the data distribution is where the proposed framework shines.Our primary focus in this work has been to setup a software prototype of the execution environment to describe, devise and calculate factors that would be vital in predicting the best execution path and the data division methodology. We were successful in setting up a semi virtual prototype environment using 3 machines, the prototype met all the theoretical and practical benchmarks as a geo distributed multi cloud environment. Test runs were done, and the two primary factors namely computational factor and reduction factor were calculated.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信