{"title":"Research challenges in deep reinforcement learning-based join query optimization","authors":"R. Guo, Khuzaima S. Daudjee","doi":"10.1145/3401071.3401657","DOIUrl":"https://doi.org/10.1145/3401071.3401657","url":null,"abstract":"The order in which relations are joined and the physical join operators used are two aspects of query plans which have a significant impact on the execution latency of join queries. However, the set of valid query plans grows exponentially with the number of relations to be joined. Hence, it becomes computationally expensive to enumerate all such plans for a complex join query. Recently, several deep reinforcement learning (DRL) based approaches propose using neural networks to construct a query plan. They demonstrate that efficient query plans can be found without exhaustively enumerating the search space. We integrated our implementation of a DRL-based solution to optimize join order and operators into the PostgreSQL query optimizer. In practice, we found limitations in the quality of the query plans chosen which are not addressed in existing approaches. In this paper we highlight some of these limitations and propose future research challenges along with potential solutions.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116571805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bandit join: preliminary results","authors":"Vahid Ghadakchi, Mian Xie, Arash Termehchy","doi":"10.1145/3401071.3401655","DOIUrl":"https://doi.org/10.1145/3401071.3401655","url":null,"abstract":"Join is arguably the most costly and frequently used operation in relational query processing. Join algorithms usually spend the majority of their time on scanning and attempting to join the parts of the base relations that do not satisfy the join condition and do not generate any results. This causes slow response time, particularly, in interactive and exploratory environments where users would like real-time performance. In this paper, we outline our vision on using online learning and adaptation to execute joins efficiently. In our approach, scan operators that precede a join, learn which parts of the relations are more likely to join during the query execution and produce more results faster by doing fewer I/O accesses. Our empirical studies using standard benchmarks indicate that this approach outperforms similar methods considerably.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122966570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner
{"title":"Best of both worlds: combining traditional and machine learning models for cardinality estimation","authors":"Lucas Woltmann, Claudio Hartmann, Dirk Habich, Wolfgang Lehner","doi":"10.1145/3401071.3401658","DOIUrl":"https://doi.org/10.1145/3401071.3401658","url":null,"abstract":"Cardinality estimation is a high-profile technique in database management systems with a serious impact on query performance. Thus, a lot of traditional approaches such as histograms-based or sampling-based methods have been developed over the last decades. With the advance of Machine Learning (ML) into the database world, cardinality estimation profits from several methods improving its quality as shown in different recent papers. However, neither an ML model nor a traditional approach meets all requirements for cardinality estimation, so that a one size fits all approach is difficult to imagine. For that reason, we advocate a better interlacing of ML models and traditional approaches for cardinality estimation and thoroughly consider their potential, advantages, and disadvantages in this paper. We start by proposing a classification of different estimation techniques and their usability for cardinality estimation. Then, we motivate a novel hybrid approach as the core proof of concept of this paper which uses the best of both worlds: ML models and the proven histogram approach. For this, we show in which cases it is beneficial to use ML models or when we can trust the traditional estimators. We evaluate our hybrid approach on two real-world data sets and conclude what can be done to improve the coexistence of traditional and ML approaches in DBMS. With all our proposals, we use ML to improve DBMS without abandoning years of valuable research in cardinality estimation.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131830698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PartLy","authors":"A. S. Abdelhamid, Walid G. Aref","doi":"10.1145/3401071.3401660","DOIUrl":"https://doi.org/10.1145/3401071.3401660","url":null,"abstract":"Data partitioning plays a critical role in data stream processing. Current data partitioning techniques use simple, static heuristics that do not incorporate feedback about the quality of the partitioning decision (i.e., fire and forget strategy). Hence, the data partitioner often repeatedly chooses the same decision. In this paper, we argue that reinforcement learning techniques can be applied to address this problem. The use of artificial neural networks can facilitate learning of efficient partitioning policies. We identify the challenges that emerge when applying machine learning techniques to the data partitioning problem for distributed data stream processing. Furthermore, we introduce PartLy, a proof-of-concept data partitioner, and present preliminary results that indicate PartLy's potential to match the performance of state-of-the-art techniques in terms of partitioning quality, while minimizing storage and processing overheads.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128777163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi
{"title":"Automated tuning of query degree of parallelism via machine learning","authors":"Zhiwei Fan, Rathijit Sen, Paraschos Koutris, Aws Albarghouthi","doi":"10.1145/3401071.3401656","DOIUrl":"https://doi.org/10.1145/3401071.3401656","url":null,"abstract":"Determining the degree of parallelism (DOP) for query execution is of great importance to both performance and resource provisioning. However, recent work that applies machine learning (ML) to query optimization and query performance prediction in relational database management systems (RDBMSs) has ignored the effect of intra-query parallelism. In this work, we argue that determining the optimal or near-optimal DOP for query execution is a fundamental and challenging task that benefits both query performance and cost-benefit tradeoffs. We then present promising preliminary results on how ML techniques can be applied to automate DOP tuning. We conclude with a list of challenges we encountered, as well as future directions for our work.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125271733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, A. Kemper, Tim Kraska, Thomas Neumann
{"title":"RadixSpline: a single-pass learned index","authors":"Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, A. Kemper, Tim Kraska, Thomas Neumann","doi":"10.1145/3401071.3401659","DOIUrl":"https://doi.org/10.1145/3401071.3401659","url":null,"abstract":"Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing learned structures are often cumbersome to implement and are slow to build. In fact, most approaches that we are aware of require multiple training passes over the data. We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. We evaluate RS using the SOSD benchmark and show that it achieves competitive results on all datasets, despite the fact that it only has two parameters.","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131459467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","authors":"","doi":"10.1145/3401071","DOIUrl":"https://doi.org/10.1145/3401071","url":null,"abstract":"","PeriodicalId":371439,"journal":{"name":"Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124782096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}