Proposed adaptive indexing for Hive

A. Abdullahi, Rohiza bt Ahmad, Nordin Zakaria
{"title":"Proposed adaptive indexing for Hive","authors":"A. Abdullahi, Rohiza bt Ahmad, Nordin Zakaria","doi":"10.1109/ISMSC.2015.7594057","DOIUrl":null,"url":null,"abstract":"The value of Big Data largely relies on its analytical outcomes; and MapReduce has so far been the most efficient tool for performing analysis on the data. However, the low level nature of MapReduce programming necessitates the development of High-level abstractions, i.e., High Level Query Languages (HLQL), such as Hive, Pig, JAQL and others. These languages can be categorized as either dataflow based or OLAP-based. For OLAP-based HLQL, in particular Hive, at the moment, the speed of retrieval of big data for the analysis is still requiring improvement. Hence, indexing is one of the techniques used for this purpose. Yet, the indexing approach still has its loopholes since it is performed manually and externally using the approach of index inclusion and two-way data scanning. It requires huge computational time and space and hence not scalable for future potential scale of big data. Thus, an adaptive indexing framework is proposed for improving both the computational time and memory usage of the indexing process. The technique shall check the user queries to determine the necessity for indexing and use internal indexing with one-way data scanning approach for the indexing strategy. In this paper, the initial framework of the technique is presented and discussed.","PeriodicalId":407600,"journal":{"name":"2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Mathematical Sciences and Computing Research (iSMSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMSC.2015.7594057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The value of Big Data largely relies on its analytical outcomes; and MapReduce has so far been the most efficient tool for performing analysis on the data. However, the low level nature of MapReduce programming necessitates the development of High-level abstractions, i.e., High Level Query Languages (HLQL), such as Hive, Pig, JAQL and others. These languages can be categorized as either dataflow based or OLAP-based. For OLAP-based HLQL, in particular Hive, at the moment, the speed of retrieval of big data for the analysis is still requiring improvement. Hence, indexing is one of the techniques used for this purpose. Yet, the indexing approach still has its loopholes since it is performed manually and externally using the approach of index inclusion and two-way data scanning. It requires huge computational time and space and hence not scalable for future potential scale of big data. Thus, an adaptive indexing framework is proposed for improving both the computational time and memory usage of the indexing process. The technique shall check the user queries to determine the necessity for indexing and use internal indexing with one-way data scanning approach for the indexing strategy. In this paper, the initial framework of the technique is presented and discussed.
提出了Hive的自适应索引
大数据的价值在很大程度上取决于它的分析结果;MapReduce是迄今为止对数据进行分析的最有效的工具。然而,MapReduce编程的低级特性需要开发高级抽象,即高级查询语言(High level Query Languages, hql),如Hive、Pig、JAQL等。这些语言可以分为基于数据流的和基于olap的两类。对于基于olap的hql,尤其是Hive,目前大数据的检索速度还有待提高。因此,索引是用于此目的的技术之一。然而,索引方法仍然存在漏洞,因为它是使用索引包含和双向数据扫描的方法手动和外部执行的。它需要大量的计算时间和空间,因此不适合未来潜在的大数据规模。因此,提出了一种自适应索引框架,以改善索引过程的计算时间和内存使用。该技术应检查用户查询以确定索引的必要性,并使用内部索引和单向数据扫描方法作为索引策略。本文提出并讨论了该技术的初步框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信