{"title":"Automated Table Partitioner (ATAP) in Apache Hive","authors":"Thivviyan Amirthalingam, H. Rais","doi":"10.1109/ICCOINS.2018.8510580","DOIUrl":null,"url":null,"abstract":"Big Data and Predictive Analytics have been a game-changing paradigm in academia and industry for the past decade, inspiring numerous efforts in multiple spaces. One of many such technologies is Hadoop, an open-sourced framework based on MapReduce for highly distributive and scalable solutions. As Hadoop became more popular, other technologies were built, making it an ecosystem by itself. Currently, there are hundreds of tools and utilities that add-on to the Hadoop framework, and Apache Hive is one of the most prominent options. Hive is built as a data warehousing layer that interacts with Hadoop and the underlying filesystem, HDFS. It quickly became the market leader in query processing as it provides better user experience than MapReduce. Nevertheless, it imposes rigid structures that are unyielding to the ever changing nature of data. This paper proposes a novel mean of automating the table partitioning in Hive. It includes a lexical analyzer that reads HiveQL queries and, in return, issues Data Definition Language (DDL) for table restructure if a particular column is read more than the user-set coefficient factor. Multiple experiment made for this research have returned results that further solidified this proof of concept for its feasibility, adaptability and usability.","PeriodicalId":168165,"journal":{"name":"2018 4th International Conference on Computer and Information Sciences (ICCOINS)","volume":"5 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Computer and Information Sciences (ICCOINS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCOINS.2018.8510580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Big Data and Predictive Analytics have been a game-changing paradigm in academia and industry for the past decade, inspiring numerous efforts in multiple spaces. One of many such technologies is Hadoop, an open-sourced framework based on MapReduce for highly distributive and scalable solutions. As Hadoop became more popular, other technologies were built, making it an ecosystem by itself. Currently, there are hundreds of tools and utilities that add-on to the Hadoop framework, and Apache Hive is one of the most prominent options. Hive is built as a data warehousing layer that interacts with Hadoop and the underlying filesystem, HDFS. It quickly became the market leader in query processing as it provides better user experience than MapReduce. Nevertheless, it imposes rigid structures that are unyielding to the ever changing nature of data. This paper proposes a novel mean of automating the table partitioning in Hive. It includes a lexical analyzer that reads HiveQL queries and, in return, issues Data Definition Language (DDL) for table restructure if a particular column is read more than the user-set coefficient factor. Multiple experiment made for this research have returned results that further solidified this proof of concept for its feasibility, adaptability and usability.