Automated Table Partitioner (ATAP) in Apache Hive

Thivviyan Amirthalingam, H. Rais
{"title":"Automated Table Partitioner (ATAP) in Apache Hive","authors":"Thivviyan Amirthalingam, H. Rais","doi":"10.1109/ICCOINS.2018.8510580","DOIUrl":null,"url":null,"abstract":"Big Data and Predictive Analytics have been a game-changing paradigm in academia and industry for the past decade, inspiring numerous efforts in multiple spaces. One of many such technologies is Hadoop, an open-sourced framework based on MapReduce for highly distributive and scalable solutions. As Hadoop became more popular, other technologies were built, making it an ecosystem by itself. Currently, there are hundreds of tools and utilities that add-on to the Hadoop framework, and Apache Hive is one of the most prominent options. Hive is built as a data warehousing layer that interacts with Hadoop and the underlying filesystem, HDFS. It quickly became the market leader in query processing as it provides better user experience than MapReduce. Nevertheless, it imposes rigid structures that are unyielding to the ever changing nature of data. This paper proposes a novel mean of automating the table partitioning in Hive. It includes a lexical analyzer that reads HiveQL queries and, in return, issues Data Definition Language (DDL) for table restructure if a particular column is read more than the user-set coefficient factor. Multiple experiment made for this research have returned results that further solidified this proof of concept for its feasibility, adaptability and usability.","PeriodicalId":168165,"journal":{"name":"2018 4th International Conference on Computer and Information Sciences (ICCOINS)","volume":"5 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Computer and Information Sciences (ICCOINS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCOINS.2018.8510580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Big Data and Predictive Analytics have been a game-changing paradigm in academia and industry for the past decade, inspiring numerous efforts in multiple spaces. One of many such technologies is Hadoop, an open-sourced framework based on MapReduce for highly distributive and scalable solutions. As Hadoop became more popular, other technologies were built, making it an ecosystem by itself. Currently, there are hundreds of tools and utilities that add-on to the Hadoop framework, and Apache Hive is one of the most prominent options. Hive is built as a data warehousing layer that interacts with Hadoop and the underlying filesystem, HDFS. It quickly became the market leader in query processing as it provides better user experience than MapReduce. Nevertheless, it imposes rigid structures that are unyielding to the ever changing nature of data. This paper proposes a novel mean of automating the table partitioning in Hive. It includes a lexical analyzer that reads HiveQL queries and, in return, issues Data Definition Language (DDL) for table restructure if a particular column is read more than the user-set coefficient factor. Multiple experiment made for this research have returned results that further solidified this proof of concept for its feasibility, adaptability and usability.
Apache Hive中的ATAP (Automated Table Partitioner)
在过去的十年里,大数据和预测分析已经成为学术界和工业界的一个改变游戏规则的范例,在多个领域激发了许多努力。Hadoop是众多此类技术之一,它是一个基于MapReduce的开源框架,用于提供高度分布式和可扩展的解决方案。随着Hadoop变得越来越受欢迎,其他技术被构建,使其本身成为一个生态系统。目前,有数百种工具和实用程序附加到Hadoop框架中,Apache Hive是最突出的选择之一。Hive是作为一个数据仓库层构建的,它与Hadoop和底层文件系统HDFS交互。由于它提供了比MapReduce更好的用户体验,它迅速成为查询处理领域的市场领导者。然而,它强加了僵化的结构,这些结构不屈服于数据不断变化的本质。本文提出了一种在Hive中实现表分区自动化的新方法。它包括一个词法分析器,它读取HiveQL查询,如果某个特定列的读取量超过用户设置的系数因子,它就会发出数据定义语言(Data Definition Language, DDL)进行表重构。本研究的多次实验结果进一步巩固了这一概念的可行性、适应性和可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信