商品工作站的大数据技术:Apache Impala的基本设置

Marin Fotache, Valerica Greavu-Serban, Ionut Hrubaru, Alexandru Tica
{"title":"商品工作站的大数据技术:Apache Impala的基本设置","authors":"Marin Fotache, Valerica Greavu-Serban, Ionut Hrubaru, Alexandru Tica","doi":"10.1145/3274005.3274021","DOIUrl":null,"url":null,"abstract":"Big Data technologies brought the idea of parallel processing on cheaper commodity servers. When dealing with huge amount of data, instead of migrating to more performant and costly hardware platforms, or buying resources in cloud, it is more affordable to add a number of cheaper servers as nodes for data processing and/or storage. NoSQL data stores, Hadoop ecosystems, NewSQL platforms have proved viable for Big Data storage and processing. In this paper we were concerned with setting up a platform for big data processing using commodity workstations. Many small and medium sized companies have limited resources and their workstations remain unused for more than 12 hours a day. Here Beowulf Cluster Computing could prove useful. Apache Impala was installed as part of a Hadoop distribution on a 9-node cluster. Three TPC-H database schema were loaded for the scale factors of 1, 2 and 10GB. A series of 100 SQL queries were randomly generated and executed for each scale factor. Results were collected and analyzed for determining if the cluster can provide a decent level of data processing performance.","PeriodicalId":152033,"journal":{"name":"Proceedings of the 19th International Conference on Computer Systems and Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Big Data Technologies on Commodity Workstations: A Basic Setup for Apache Impala\",\"authors\":\"Marin Fotache, Valerica Greavu-Serban, Ionut Hrubaru, Alexandru Tica\",\"doi\":\"10.1145/3274005.3274021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data technologies brought the idea of parallel processing on cheaper commodity servers. When dealing with huge amount of data, instead of migrating to more performant and costly hardware platforms, or buying resources in cloud, it is more affordable to add a number of cheaper servers as nodes for data processing and/or storage. NoSQL data stores, Hadoop ecosystems, NewSQL platforms have proved viable for Big Data storage and processing. In this paper we were concerned with setting up a platform for big data processing using commodity workstations. Many small and medium sized companies have limited resources and their workstations remain unused for more than 12 hours a day. Here Beowulf Cluster Computing could prove useful. Apache Impala was installed as part of a Hadoop distribution on a 9-node cluster. Three TPC-H database schema were loaded for the scale factors of 1, 2 and 10GB. A series of 100 SQL queries were randomly generated and executed for each scale factor. Results were collected and analyzed for determining if the cluster can provide a decent level of data processing performance.\",\"PeriodicalId\":152033,\"journal\":{\"name\":\"Proceedings of the 19th International Conference on Computer Systems and Technologies\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3274005.3274021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274005.3274021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

大数据技术带来了在廉价商品服务器上并行处理的想法。在处理大量数据时,与其迁移到性能更高、成本更高的硬件平台,或者在云中购买资源,不如添加一些更便宜的服务器作为节点进行数据处理和/或存储。NoSQL数据存储、Hadoop生态系统、NewSQL平台已经证明了大数据存储和处理的可行性。在本文中,我们关注的是建立一个使用商品工作站的大数据处理平台。许多中小企业的资源有限,他们的工作站每天有超过12个小时是闲置的。在这里,Beowulf集群计算可以证明是有用的。Apache Impala是作为Hadoop发行版的一部分安装在一个9节点集群上的。加载了3个TPC-H数据库模式,规模因子分别为1、2和10GB。针对每个比例因子随机生成并执行一系列100个SQL查询。收集和分析结果,以确定集群是否能够提供适当水平的数据处理性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Big Data Technologies on Commodity Workstations: A Basic Setup for Apache Impala
Big Data technologies brought the idea of parallel processing on cheaper commodity servers. When dealing with huge amount of data, instead of migrating to more performant and costly hardware platforms, or buying resources in cloud, it is more affordable to add a number of cheaper servers as nodes for data processing and/or storage. NoSQL data stores, Hadoop ecosystems, NewSQL platforms have proved viable for Big Data storage and processing. In this paper we were concerned with setting up a platform for big data processing using commodity workstations. Many small and medium sized companies have limited resources and their workstations remain unused for more than 12 hours a day. Here Beowulf Cluster Computing could prove useful. Apache Impala was installed as part of a Hadoop distribution on a 9-node cluster. Three TPC-H database schema were loaded for the scale factors of 1, 2 and 10GB. A series of 100 SQL queries were randomly generated and executed for each scale factor. Results were collected and analyzed for determining if the cluster can provide a decent level of data processing performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信