Docker基于容器的多云大数据处理系统，为大家服务

2017 IEEE International Systems Engineering Symposium (ISSE) Pub Date : 2017-10-30 DOI:10.1109/SYSENG.2017.8088294

N. Naik

{"title":"Docker基于容器的多云大数据处理系统，为大家服务","authors":"N. Naik","doi":"10.1109/SYSENG.2017.8088294","DOIUrl":null,"url":null,"abstract":"Big data processing is progressively becoming essential for everyone to extract the meaningful information from their large volume of data irrespective of types of users and their application areas. Big data processing is a broad term and includes several operations such as the storage, cleaning, organization, modelling, analysis and presentation of data at a scale and efficiency. For ordinary users, the significant challenges are the requirement of the powerful data processing system and its provisioning, installation of complex big data analytics and difficulty in their usage. Docker is a container-based virtualization technology and it has recently introduced Docker Swarm for the development of various types of multi-cloud distributed systems, which can be helpful in solving all above problems for ordinary users. However, Docker is predominantly used in the software development industry, and less focus is given to the data processing aspect of this container-based technology. Therefore, this paper proposes the Docker container-based big data processing system in multiple clouds for everyone, which explores another potential dimension of Docker for big data analysis. This Docker container-based system is an inexpensive and user-friendly framework for everyone who has the knowledge of basic IT skills. Additionally, it can be easily developed on a single machine, multiple machines or multiple clouds. This paper demonstrates the architectural design and simulated development of the proposed Docker container-based big data processing system in multiple clouds. Subsequently, it illustrates the automated provisioning of big data clusters using two popular big data analytics, Hadoop and Pachyderm (without Hadoop) including the Web-based GUI interface Hue for easy data processing in Hadoop.","PeriodicalId":354846,"journal":{"name":"2017 IEEE International Systems Engineering Symposium (ISSE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Docker container-based big data processing system in multiple clouds for everyone\",\"authors\":\"N. Naik\",\"doi\":\"10.1109/SYSENG.2017.8088294\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data processing is progressively becoming essential for everyone to extract the meaningful information from their large volume of data irrespective of types of users and their application areas. Big data processing is a broad term and includes several operations such as the storage, cleaning, organization, modelling, analysis and presentation of data at a scale and efficiency. For ordinary users, the significant challenges are the requirement of the powerful data processing system and its provisioning, installation of complex big data analytics and difficulty in their usage. Docker is a container-based virtualization technology and it has recently introduced Docker Swarm for the development of various types of multi-cloud distributed systems, which can be helpful in solving all above problems for ordinary users. However, Docker is predominantly used in the software development industry, and less focus is given to the data processing aspect of this container-based technology. Therefore, this paper proposes the Docker container-based big data processing system in multiple clouds for everyone, which explores another potential dimension of Docker for big data analysis. This Docker container-based system is an inexpensive and user-friendly framework for everyone who has the knowledge of basic IT skills. Additionally, it can be easily developed on a single machine, multiple machines or multiple clouds. This paper demonstrates the architectural design and simulated development of the proposed Docker container-based big data processing system in multiple clouds. Subsequently, it illustrates the automated provisioning of big data clusters using two popular big data analytics, Hadoop and Pachyderm (without Hadoop) including the Web-based GUI interface Hue for easy data processing in Hadoop.\",\"PeriodicalId\":354846,\"journal\":{\"name\":\"2017 IEEE International Systems Engineering Symposium (ISSE)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Systems Engineering Symposium (ISSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYSENG.2017.8088294\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Systems Engineering Symposium (ISSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYSENG.2017.8088294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

无论用户类型和应用领域如何，大数据处理对于每个人从大量数据中提取有意义的信息越来越重要。大数据处理是一个广义的术语，包括大量高效的数据存储、清理、组织、建模、分析和呈现等操作。对于普通用户来说，最大的挑战是对强大的数据处理系统及其配置的需求，复杂的大数据分析的安装以及使用难度。Docker是一种基于容器的虚拟化技术，它最近推出了Docker Swarm用于开发各种类型的多云分布式系统，可以帮助普通用户解决以上所有问题。然而，Docker主要用于软件开发行业，很少关注这种基于容器的技术的数据处理方面。因此，本文为大家提出了基于Docker容器的多云大数据处理系统，探索了Docker在大数据分析方面的另一个潜在维度。这个基于Docker容器的系统是一个价格低廉且用户友好的框架，适合具有基本IT技能知识的每个人。此外，它可以很容易地在单台机器、多台机器或多个云上开发。本文演示了本文提出的基于Docker容器的多云大数据处理系统的架构设计和仿真开发。随后，它说明了使用两种流行的大数据分析，Hadoop和Pachyderm(没有Hadoop)自动配置大数据集群，包括基于web的GUI界面Hue，以便在Hadoop中轻松处理数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Docker container-based big data processing system in multiple clouds for everyone

Big data processing is progressively becoming essential for everyone to extract the meaningful information from their large volume of data irrespective of types of users and their application areas. Big data processing is a broad term and includes several operations such as the storage, cleaning, organization, modelling, analysis and presentation of data at a scale and efficiency. For ordinary users, the significant challenges are the requirement of the powerful data processing system and its provisioning, installation of complex big data analytics and difficulty in their usage. Docker is a container-based virtualization technology and it has recently introduced Docker Swarm for the development of various types of multi-cloud distributed systems, which can be helpful in solving all above problems for ordinary users. However, Docker is predominantly used in the software development industry, and less focus is given to the data processing aspect of this container-based technology. Therefore, this paper proposes the Docker container-based big data processing system in multiple clouds for everyone, which explores another potential dimension of Docker for big data analysis. This Docker container-based system is an inexpensive and user-friendly framework for everyone who has the knowledge of basic IT skills. Additionally, it can be easily developed on a single machine, multiple machines or multiple clouds. This paper demonstrates the architectural design and simulated development of the proposed Docker container-based big data processing system in multiple clouds. Subsequently, it illustrates the automated provisioning of big data clusters using two popular big data analytics, Hadoop and Pachyderm (without Hadoop) including the Web-based GUI interface Hue for easy data processing in Hadoop.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Systems Engineering Symposium (ISSE)

自引率

0.00%

发文量