{"title":"一种在多云环境下实现大数据分析的中间件框架","authors":"Salman Hussain, M. Chowdhury","doi":"10.1109/ICSENG.2018.8638175","DOIUrl":null,"url":null,"abstract":"In the past two decades with the rise of Big data, cloud computing and IoT the world has seen an explosion in the way data is generated, stored and analysed. Data analytics has become the driving factor of many businesses, many of which are interlinked with one another or operate in various parts of the world. Data storage is now very much geographically distributed, and the era of dedicated datacentres is long gone. Currently, there are many scenarios where data resides on various datacentres or cloud, which have heterogenous computational capacity, network capacity and are geographically distributed. Unfortunately, the current paradigms available perform poorly in such scenarios. This has given rise to the need for a computational paradigm that is capable of analysing data over a geo graphically distributed environment.In this paper we propose using a hierarchical framework to improve the performance of Hadoop in a multi cloud or a geographically distributed environment. We have chosen Hadoop due to it being an implementation of the popular map reduce paradigm. The proposed framework considers all the heterogeneity and uses it carry out a dynamic job scheduling strategy capable enough of giving the best execution path with the least latency. The framework basically dictates the use of best possible job scheduling technique to be used in a geo distributed environment by keeping the Hadoop framework intact. The low-level computations would be taken care of by the plain Hadoop implementation, however the job scheduling and the data distribution is where the proposed framework shines.Our primary focus in this work has been to setup a software prototype of the execution environment to describe, devise and calculate factors that would be vital in predicting the best execution path and the data division methodology. We were successful in setting up a semi virtual prototype environment using 3 machines, the prototype met all the theoretical and practical benchmarks as a geo distributed multi cloud environment. Test runs were done, and the two primary factors namely computational factor and reduction factor were calculated.","PeriodicalId":356324,"journal":{"name":"2018 26th International Conference on Systems Engineering (ICSEng)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A NOVEL MIDDLEWARE FRAMEWORK FOR IMPLEMENTING BIGDATA ANALYTICS IN MULTI CLOUD ENVIRONMENT\",\"authors\":\"Salman Hussain, M. Chowdhury\",\"doi\":\"10.1109/ICSENG.2018.8638175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the past two decades with the rise of Big data, cloud computing and IoT the world has seen an explosion in the way data is generated, stored and analysed. Data analytics has become the driving factor of many businesses, many of which are interlinked with one another or operate in various parts of the world. Data storage is now very much geographically distributed, and the era of dedicated datacentres is long gone. Currently, there are many scenarios where data resides on various datacentres or cloud, which have heterogenous computational capacity, network capacity and are geographically distributed. Unfortunately, the current paradigms available perform poorly in such scenarios. This has given rise to the need for a computational paradigm that is capable of analysing data over a geo graphically distributed environment.In this paper we propose using a hierarchical framework to improve the performance of Hadoop in a multi cloud or a geographically distributed environment. We have chosen Hadoop due to it being an implementation of the popular map reduce paradigm. The proposed framework considers all the heterogeneity and uses it carry out a dynamic job scheduling strategy capable enough of giving the best execution path with the least latency. The framework basically dictates the use of best possible job scheduling technique to be used in a geo distributed environment by keeping the Hadoop framework intact. The low-level computations would be taken care of by the plain Hadoop implementation, however the job scheduling and the data distribution is where the proposed framework shines.Our primary focus in this work has been to setup a software prototype of the execution environment to describe, devise and calculate factors that would be vital in predicting the best execution path and the data division methodology. We were successful in setting up a semi virtual prototype environment using 3 machines, the prototype met all the theoretical and practical benchmarks as a geo distributed multi cloud environment. Test runs were done, and the two primary factors namely computational factor and reduction factor were calculated.\",\"PeriodicalId\":356324,\"journal\":{\"name\":\"2018 26th International Conference on Systems Engineering (ICSEng)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th International Conference on Systems Engineering (ICSEng)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSENG.2018.8638175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th International Conference on Systems Engineering (ICSEng)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSENG.2018.8638175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A NOVEL MIDDLEWARE FRAMEWORK FOR IMPLEMENTING BIGDATA ANALYTICS IN MULTI CLOUD ENVIRONMENT
In the past two decades with the rise of Big data, cloud computing and IoT the world has seen an explosion in the way data is generated, stored and analysed. Data analytics has become the driving factor of many businesses, many of which are interlinked with one another or operate in various parts of the world. Data storage is now very much geographically distributed, and the era of dedicated datacentres is long gone. Currently, there are many scenarios where data resides on various datacentres or cloud, which have heterogenous computational capacity, network capacity and are geographically distributed. Unfortunately, the current paradigms available perform poorly in such scenarios. This has given rise to the need for a computational paradigm that is capable of analysing data over a geo graphically distributed environment.In this paper we propose using a hierarchical framework to improve the performance of Hadoop in a multi cloud or a geographically distributed environment. We have chosen Hadoop due to it being an implementation of the popular map reduce paradigm. The proposed framework considers all the heterogeneity and uses it carry out a dynamic job scheduling strategy capable enough of giving the best execution path with the least latency. The framework basically dictates the use of best possible job scheduling technique to be used in a geo distributed environment by keeping the Hadoop framework intact. The low-level computations would be taken care of by the plain Hadoop implementation, however the job scheduling and the data distribution is where the proposed framework shines.Our primary focus in this work has been to setup a software prototype of the execution environment to describe, devise and calculate factors that would be vital in predicting the best execution path and the data division methodology. We were successful in setting up a semi virtual prototype environment using 3 machines, the prototype met all the theoretical and practical benchmarks as a geo distributed multi cloud environment. Test runs were done, and the two primary factors namely computational factor and reduction factor were calculated.