通过索引连接查询优化HIVE的性能提升

International journal of database theory and application Pub Date : 2017-09-30 DOI:10.14257/ijdta.2017.10.9.02

Stephen Neal Joshua Eali, N. Thirupathi Rao, Swathi Kalam, D. Bhattacharyya, Hye-jin Kim

{"title":"通过索引连接查询优化HIVE的性能提升","authors":"Stephen Neal Joshua Eali, N. Thirupathi Rao, Swathi Kalam, D. Bhattacharyya, Hye-jin Kim","doi":"10.14257/ijdta.2017.10.9.02","DOIUrl":null,"url":null,"abstract":"Index joins range unit pivotal for proficiency and quality once technique questions over colossal data. HIVE may be a cluster balanced immense data administration motor that is good for data examination applications and for OLAP for phenomenally \"specific\" inquiries whose yield sizes region unit little division from the contributing data, there the beast compel experiences poor execution because of repetitive circle I/O operations or end in starts of additional guide operations. Here all through this paper a shot is made and propose file joins procedure to rush up the inquiry strategy and incorporate it in Hive by mapping our vogue to the unique change stream to assess the execution, we've a slant to give and measure check inquiries on datasets created abuse TPC-H benchmark. Our outcomes show vital execution increase over moderately tremendous data sets and/or uncommonly specific questions having a two-way are a piece of and one be a piece of condition.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"7 1","pages":"11-22"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance Gain in HIVE through Query Optimization using Index Joins\",\"authors\":\"Stephen Neal Joshua Eali, N. Thirupathi Rao, Swathi Kalam, D. Bhattacharyya, Hye-jin Kim\",\"doi\":\"10.14257/ijdta.2017.10.9.02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Index joins range unit pivotal for proficiency and quality once technique questions over colossal data. HIVE may be a cluster balanced immense data administration motor that is good for data examination applications and for OLAP for phenomenally \\\"specific\\\" inquiries whose yield sizes region unit little division from the contributing data, there the beast compel experiences poor execution because of repetitive circle I/O operations or end in starts of additional guide operations. Here all through this paper a shot is made and propose file joins procedure to rush up the inquiry strategy and incorporate it in Hive by mapping our vogue to the unique change stream to assess the execution, we've a slant to give and measure check inquiries on datasets created abuse TPC-H benchmark. Our outcomes show vital execution increase over moderately tremendous data sets and/or uncommonly specific questions having a two-way are a piece of and one be a piece of condition.\",\"PeriodicalId\":13926,\"journal\":{\"name\":\"International journal of database theory and application\",\"volume\":\"7 1\",\"pages\":\"11-22\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of database theory and application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14257/ijdta.2017.10.9.02\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/ijdta.2017.10.9.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在庞大的数据中，一旦出现技术问题，指数加入范围单位对熟练程度和质量至关重要。HIVE可能是一个集群平衡的巨大数据管理马达，它适用于数据检查应用程序和OLAP，用于非常“特定”的查询，这些查询的生成大小与贡献数据的区域单位相差很小，在那里，由于重复的循环I/O操作或结束于额外的引导操作的启动，强制执行体验较差。在这里，通过本文的尝试，提出了一个文件连接过程，通过将我们的时尚映射到独特的变更流来评估执行，从而加快查询策略并将其纳入Hive，我们倾向于对滥用TPC-H基准创建的数据集进行检查查询。我们的结果显示，在适度庞大的数据集和/或不常见的特定问题上，执行力有了重要的提高，其中一个是双向的，一个是一个条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance Gain in HIVE through Query Optimization using Index Joins

Index joins range unit pivotal for proficiency and quality once technique questions over colossal data. HIVE may be a cluster balanced immense data administration motor that is good for data examination applications and for OLAP for phenomenally "specific" inquiries whose yield sizes region unit little division from the contributing data, there the beast compel experiences poor execution because of repetitive circle I/O operations or end in starts of additional guide operations. Here all through this paper a shot is made and propose file joins procedure to rush up the inquiry strategy and incorporate it in Hive by mapping our vogue to the unique change stream to assess the execution, we've a slant to give and measure check inquiries on datasets created abuse TPC-H benchmark. Our outcomes show vital execution increase over moderately tremendous data sets and/or uncommonly specific questions having a two-way are a piece of and one be a piece of condition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of database theory and application

自引率

0.00%

发文量