A Hive and SQL Case Study in Cloud Data Analytics

Shireesha Chandra, A. Varde, Jiayin Wang
{"title":"A Hive and SQL Case Study in Cloud Data Analytics","authors":"Shireesha Chandra, A. Varde, Jiayin Wang","doi":"10.1109/UEMCON47517.2019.8992925","DOIUrl":null,"url":null,"abstract":"The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8992925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The digital universe is expanding at a very fast pace generating massive datasets. In order to keep up with the processing and storage needs for this big data, and to discover knowledge, we need scalable infrastructure and technologies that can access data from multiple disks simultaneously. Cloud computing provides paradigms for data analytics over such huge datasets. While SQL continues to be popular among database and data mining professionals, in recent years Hive has established itself as a rapidly advancing technology for big data which makes it highly suitable for use over the cloud. In this paper, we present investigatory research on Hive and SQL with a detailed case study between them, considering cloud data management and mining. Our work here constitutes a thorough scrutiny, focusing on processing Hive queries on cloud infrastructure considering three different approaches, and also delving into SQL processing on the cloud with similar approaches. Real datasets are used for conducting various operations using Hive and SQL. This paper conducts performance comparisons of the two technologies and explains the environment in which one is preferred over the other for processing and analyzing data. It provides recommendations for cloud data analytics, based on the case study.
Hive和SQL在云数据分析中的案例研究
数字宇宙正在以非常快的速度膨胀,产生大量的数据集。为了跟上大数据的处理和存储需求,并发现知识,我们需要可扩展的基础设施和技术,可以同时从多个磁盘访问数据。云计算为如此庞大的数据集提供了数据分析的范例。虽然SQL在数据库和数据挖掘专业人士中继续流行,但近年来Hive已经成为一种快速发展的大数据技术,这使得它非常适合在云上使用。在本文中,我们对Hive和SQL进行了调查研究,并对它们进行了详细的案例研究,考虑了云数据管理和挖掘。我们在这里的工作构成了一个彻底的审查,重点是在云基础设施上处理Hive查询,考虑了三种不同的方法,并深入研究了云上使用类似方法的SQL处理。真实数据集用于Hive和SQL的各种操作。本文对这两种技术进行了性能比较,并解释了在处理和分析数据时一种技术优于另一种技术的环境。它提供了基于案例研究的云数据分析建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信