Holistic indexing: offline, online and adaptive indexing in the same kernel

PhD '12 Pub Date : 2012-05-20 DOI:10.1145/2213598.2213604

E. Petraki

{"title":"Holistic indexing: offline, online and adaptive indexing in the same kernel","authors":"E. Petraki","doi":"10.1145/2213598.2213604","DOIUrl":null,"url":null,"abstract":"Proper physical design is a momentous issue for the performance of modern database systems and applications. Nowadays, a growing amount of applications require the execution of dynamic and exploratory workloads with unpredictable characteristics that change over time, e.g., social networks, scientific databases and multimedia databases. In addition, as most modern applications move to the big data era, investing time and resources in building the wrong set of indexes over large collections of data can severely affect performance.\n Offline, online and adaptive indexing are three distinct approaches to the problem of automating the physical design choices. Offline indexing is best in static environments with stable workloads. Online indexing is best in relatively dynamic environments where the query workload can be monitored. Adaptive indexing is best in fully dynamic environments where no idle time or workload knowledge may be assumed. We observe that these three approaches are complementary, while none of them can satisfy the needs of modern applications in isolation.\n We envision a new index selection approach, holistic indexing that excels its predecessors by combining the best features of offline, online and adaptive indexing while overcoming their weaknesses. The main goal is the creation of a database kernel that can autonomously create partial indexes which are continuously refined during query processing as in adaptive indexing but at the same time the system continuously detects any opportunity to improve the physical design offline; whenever any idle time occurs it tries to exploit knowledge gathered during query processing to refine existing indexes further or create new ones. We sketch the research space and the new challenges such a direction brings.","PeriodicalId":335125,"journal":{"name":"PhD '12","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PhD '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213598.2213604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Proper physical design is a momentous issue for the performance of modern database systems and applications. Nowadays, a growing amount of applications require the execution of dynamic and exploratory workloads with unpredictable characteristics that change over time, e.g., social networks, scientific databases and multimedia databases. In addition, as most modern applications move to the big data era, investing time and resources in building the wrong set of indexes over large collections of data can severely affect performance. Offline, online and adaptive indexing are three distinct approaches to the problem of automating the physical design choices. Offline indexing is best in static environments with stable workloads. Online indexing is best in relatively dynamic environments where the query workload can be monitored. Adaptive indexing is best in fully dynamic environments where no idle time or workload knowledge may be assumed. We observe that these three approaches are complementary, while none of them can satisfy the needs of modern applications in isolation. We envision a new index selection approach, holistic indexing that excels its predecessors by combining the best features of offline, online and adaptive indexing while overcoming their weaknesses. The main goal is the creation of a database kernel that can autonomously create partial indexes which are continuously refined during query processing as in adaptive indexing but at the same time the system continuously detects any opportunity to improve the physical design offline; whenever any idle time occurs it tries to exploit knowledge gathered during query processing to refine existing indexes further or create new ones. We sketch the research space and the new challenges such a direction brings.

查看原文本刊更多论文

整体索引:离线，在线和自适应索引在同一个内核

正确的物理设计对于现代数据库系统和应用程序的性能来说是一个重要的问题。如今，越来越多的应用程序需要执行动态和探索性工作负载，这些工作负载具有随时间变化的不可预测特征，例如社交网络、科学数据库和多媒体数据库。此外，随着大多数现代应用程序进入大数据时代，在大型数据集合上构建错误的索引集所花费的时间和资源可能会严重影响性能。离线、在线和自适应索引是解决物理设计选择自动化问题的三种不同方法。离线索引最适合工作负载稳定的静态环境。在线索引在可以监控查询工作负载的相对动态环境中是最好的。自适应索引在完全动态的环境中是最好的，在这种环境中没有空闲时间或工作负载知识。我们注意到，这三种方法是互补的，但它们都不能单独满足现代应用的需要。我们设想了一种新的索引选择方法，即综合了离线索引、在线索引和自适应索引的优点，同时克服了它们的缺点，从而超越了其前身的整体索引。主要目标是创建一个数据库内核，它可以自主创建部分索引，这些索引在查询处理过程中不断改进，就像自适应索引一样，但同时系统会不断检测任何离线改进物理设计的机会;无论何时出现空闲时间，它都会尝试利用在查询处理期间收集的知识来进一步优化现有索引或创建新索引。概述了这一方向带来的研究空间和新挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PhD '12

自引率

0.00%

发文量