Proposed adaptive indexing for Hive

2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC) Pub Date : 2015-05-19 DOI:10.1109/ISMSC.2015.7594057

A. Abdullahi, Rohiza bt Ahmad, Nordin Zakaria

{"title":"Proposed adaptive indexing for Hive","authors":"A. Abdullahi, Rohiza bt Ahmad, Nordin Zakaria","doi":"10.1109/ISMSC.2015.7594057","DOIUrl":null,"url":null,"abstract":"The value of Big Data largely relies on its analytical outcomes; and MapReduce has so far been the most efficient tool for performing analysis on the data. However, the low level nature of MapReduce programming necessitates the development of High-level abstractions, i.e., High Level Query Languages (HLQL), such as Hive, Pig, JAQL and others. These languages can be categorized as either dataflow based or OLAP-based. For OLAP-based HLQL, in particular Hive, at the moment, the speed of retrieval of big data for the analysis is still requiring improvement. Hence, indexing is one of the techniques used for this purpose. Yet, the indexing approach still has its loopholes since it is performed manually and externally using the approach of index inclusion and two-way data scanning. It requires huge computational time and space and hence not scalable for future potential scale of big data. Thus, an adaptive indexing framework is proposed for improving both the computational time and memory usage of the indexing process. The technique shall check the user queries to determine the necessity for indexing and use internal indexing with one-way data scanning approach for the indexing strategy. In this paper, the initial framework of the technique is presented and discussed.","PeriodicalId":407600,"journal":{"name":"2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMSC.2015.7594057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The value of Big Data largely relies on its analytical outcomes; and MapReduce has so far been the most efficient tool for performing analysis on the data. However, the low level nature of MapReduce programming necessitates the development of High-level abstractions, i.e., High Level Query Languages (HLQL), such as Hive, Pig, JAQL and others. These languages can be categorized as either dataflow based or OLAP-based. For OLAP-based HLQL, in particular Hive, at the moment, the speed of retrieval of big data for the analysis is still requiring improvement. Hence, indexing is one of the techniques used for this purpose. Yet, the indexing approach still has its loopholes since it is performed manually and externally using the approach of index inclusion and two-way data scanning. It requires huge computational time and space and hence not scalable for future potential scale of big data. Thus, an adaptive indexing framework is proposed for improving both the computational time and memory usage of the indexing process. The technique shall check the user queries to determine the necessity for indexing and use internal indexing with one-way data scanning approach for the indexing strategy. In this paper, the initial framework of the technique is presented and discussed.

查看原文本刊更多论文

提出了Hive的自适应索引

大数据的价值在很大程度上取决于它的分析结果;MapReduce是迄今为止对数据进行分析的最有效的工具。然而，MapReduce编程的低级特性需要开发高级抽象，即高级查询语言(High level Query Languages, hql)，如Hive、Pig、JAQL等。这些语言可以分为基于数据流的和基于olap的两类。对于基于olap的hql，尤其是Hive，目前大数据的检索速度还有待提高。因此，索引是用于此目的的技术之一。然而，索引方法仍然存在漏洞，因为它是使用索引包含和双向数据扫描的方法手动和外部执行的。它需要大量的计算时间和空间，因此不适合未来潜在的大数据规模。因此，提出了一种自适应索引框架，以改善索引过程的计算时间和内存使用。该技术应检查用户查询以确定索引的必要性，并使用内部索引和单向数据扫描方法作为索引策略。本文提出并讨论了该技术的初步框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC)

自引率

0.00%

发文量