Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, Alfons Kemper
{"title":"QO-Insight: Inspecting Steered Query Optimizers","authors":"Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, Alfons Kemper","doi":"10.14778/3611540.3611586","DOIUrl":"https://doi.org/10.14778/3611540.3611586","url":null,"abstract":"Steered query optimizers address the planning mistakes of traditional query optimizers by providing them with hints on a per-query basis, thereby guiding them in the right direction. This paper introduces QO-Insight, a visual tool designed for exploring query execution traces of such steered query optimizers. Although steered query optimizers are typically perceived as black boxes, QO-Insight empowers database administrators and experts to gain qualitative insights and enhance their performance through visual inspection and analysis.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134950906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KGNav: A Knowledge Graph Navigational Visual Query System","authors":"Xiang Wang, Xin Wang, Zhaozhuo Li, Dong Han","doi":"10.14778/3611540.3611592","DOIUrl":"https://doi.org/10.14778/3611540.3611592","url":null,"abstract":"Visual query is a vital technique for comprehending and analyzing knowledge graphs, which provides an effective method to lower the barrier of querying knowledge graphs for non-professional users. Nevertheless, visual query techniques for knowledge graphs and ontologies that have emerged in recent years cannot bridge the gap between global information provided by the knowledge graph schema and underlying data of knowledge graph. Thus it cannot fully exploit the global information to navigate users for querying knowledge graphs. This demonstration showcases KGNav, a Knowledge Graph Navigational visual query system. KGNav (1) redefines the minimal unit of operation to abstract the conceptual hierarchy, i.e., Knowledge Graph Schema, in the domain from the original knowledge graph in an offline semi-automatic way through the equivalence relations between these units; it also (2) provides a series of operators and an interactive GUI to capture user query intentions, guiding users to explore the Knowledge Graph Schema to achieve in-depth analysis of knowledge graphs. We will demonstrate the capability of KGNav in reducing tedious queries, enabling users to swiftly grasp the structure of the knowledge graph, and performing queries through several fundamental scenarios.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing","authors":"Fei Xiao, Jiong Xie, Zhida Chen, Feifei Li, Zhen Chen, Jianwei Liu, Yinpei Liu","doi":"10.14778/3611540.3611597","DOIUrl":"https://doi.org/10.14778/3611540.3611597","url":null,"abstract":"The development of Earth Observation technology contributes to the production of massive raster data. It is vital to manage and conduct analytical tasks on the raster data. Existing solutions employ dedicated systems for the raster data management and processing, respectively, incurring problems such as data redundancy, difficulty in updating, expensive data transferring and transformation, etc. To cope with these limitations, this demonstration presents Ganos Aero, a cloud-native system for big raster data management and processing. Ganos Aero proposes a unified raster data model for both the data management and processing, which stores a single copy of the raster data and without performing an expensive tiling procedure, and thus achieves significant improvement in the storage and updating efficiency. To enable efficient query and batch task processing, Ganos Aero implements an on-the-fly tile production mechanism, and optimizes its performance using the cloud features including decoupling compute from storage and pushing costly operations closer to the storage layer. Since deployed in Alibaba Cloud in 2022, Ganos Aero has been playing a critical role in many real applications including the modern agriculture, environment monitoring and protection, et al.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demonstration of OpenDBML, a Framework for Democratizing In-Database Machine Learning","authors":"Mahdi Ghorbani, Amir Shaikhha","doi":"10.14778/3611540.3611598","DOIUrl":"https://doi.org/10.14778/3611540.3611598","url":null,"abstract":"Machine learning over relational data has been used in several applications. The traditional approach of joining relations first and then training a model on the joined table is time-consuming and requires a significant amount of memory. Recent research has focused on in-database machine learning (in-DB ML) to address this issue; these methods train the models over relations without joining, resulting in a more efficient process. However, such systems have ad-hoc user interfaces and specific data formats, making them challenging to use. To address this problem, this paper presents OpenDBML, a framework for democratizing in-DB ML. OpenDBML offers a Python interface for multiple in-DB ML systems, a set of commonly used datasets, and the ability to add new datasets and in-DB ML systems via both Python and web interfaces. The paper also presents comprehensive demonstration scenarios to illustrate how to use OpenDBML effectively.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SimpleTS: An Efficient and Universal Model Selection Framework for Time Series Forecasting","authors":"Yuanyuan Yao, Dimeng Li, Hailiang Jie, Hailiang Jie, Tianyi Li, Jie Chen, Jiaqi Wang, Feifei Li, Yunjun Gao","doi":"10.14778/3611540.3611561","DOIUrl":"https://doi.org/10.14778/3611540.3611561","url":null,"abstract":"Time series forecasting, that predicts events through a sequence of time, has received increasing attention in past decades. The diverse range of time series forecasting models presents a challenge for selecting the most suitable model for a given dataset. As such, the Alibaba Cloud database monitoring system must address the issue of selecting an optimal forecasting model for a single time series data. While several model selection frameworks, including AutoAI-TS, have been developed to predict a dataset, their effectiveness may be limited as they may not adapt well to all types of time series, resulting in reduced prediction accuracy. Alternatively, models such as AutoForecast, which train on individual data points, may offer better adaptability but are limited by longer training time required. In this paper, we introduce SimpleTS, a versatile framework for time series forecasting that exhibits high efficiency and accuracy across all types of time series data. When performing an online prediction task, SimpleTS first classifies input time series into one type, and then efficiently selects the most suitable prediction model for this type. To optimize performance, SimpleTS (i) clusters models with similar performance to improve the efficiency of classification; (ii) uses soft labeling and weighted representation learning to achieve higher classification accuracy for different time series types. Extensive experiments on 3 private datasets and 52 public datasets show that SimpleTS outperforms the state-of-the-art toolkits in terms of both training time and prediction accuracy.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving Hard Variants of Database Schema Matching on Quantum Computers","authors":"Kristin Fritsch, Stefanie Scherzinger","doi":"10.14778/3611540.3611603","DOIUrl":"https://doi.org/10.14778/3611540.3611603","url":null,"abstract":"With quantum computers now available as cloud services, there is a global quest for applications where a quantum advantage can be shown. Naturally, data management is a candidate domain. Workable solutions require the design of hybrid quantum algorithms, where a quantum computing unit (a QPU) and classical computing (via CPUs) cooperate towards solving a problem. This demo illustrates such an end-to-end solution targeting NP-hard variants of database schema matching. Our demo is intended to be educational (and hopefully inspiring), allowing participants to explore the critical design decisions, such as the handover between phases of QPU- and CPU-based computation. It will also allow participants to experience hands-on - through playful interaction - how easily problem sizes exceed the limitations of today's QPUs.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xintong Song, Yusen Zhu, Jianfei Wu, Bai Liu, Hongkang Wei
{"title":"ADOps: An Anomaly Detection Pipeline in Structured Logs","authors":"Xintong Song, Yusen Zhu, Jianfei Wu, Bai Liu, Hongkang Wei","doi":"10.14778/3611540.3611618","DOIUrl":"https://doi.org/10.14778/3611540.3611618","url":null,"abstract":"Anomaly detection has been extensively implemented in industry. The reality is that an application may have numerous scenarios where anomalies need to be monitored. However, the complete process of anomaly detection will take much time, including data acquisition, data processing, model training, and model deployment. In particular, some simple scenarios do not require building complex anomaly detection models. This results in a waste of resources. To solve these problems, we build an anomaly detection pipeline(ADOps) to modularize each step. For simple anomaly detection scenarios, no programming is required and new anomaly detection tasks can be created by simply modifying the configuration file. In addition, it can also improve the development efficiency of complex anomaly detection models. We show how users create anomaly detection tasks on the anomaly detection pipeline and how engineers use it to develop anomaly detection models.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas Spenger, Chengyang Huang, Philipp Haller, Paris Carbone
{"title":"Portals: A Showcase of Multi-Dataflow Stateful Serverless","authors":"Jonas Spenger, Chengyang Huang, Philipp Haller, Paris Carbone","doi":"10.14778/3611540.3611619","DOIUrl":"https://doi.org/10.14778/3611540.3611619","url":null,"abstract":"Serverless applications spanning the cloud and edge require flexible programming frameworks for expressing compositions across the different levels of deployment. Another critical aspect for applications with state is failure resilience beyond the scope of a single dataflow graph that is the current standard in data streaming systems. This paper presents Portals, an interactive, stateful dataflow composition framework with strong end-to-end guarantees. Portals enables event-driven, resilient applications that span across dataflow graphs and serverless deployments. The demonstration exhibits three scenarios in our multi-dataflow streaming-based system: dynamically composing a stateful serverless application; an interactive cloud and edge serverless application; and a Portals browser playground.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Povzner, Prince Mahajan, Jason Gustafson, Jun Rao, Ismael Juma, Feng Min, Shriram Sridharan, Nikhil Bhatia, Gopi Attaluri, Adithya Chandra, Stanislav Kozlovski, Rajini Sivaram, Lucas Bradstreet, Bob Barrett, Dhruvil Shah, David Jacot, David Arthur, Ron Dagostino, Colin McCabe, Manikumar Reddy Obili, Kowshik Prakasam, Jose Garcia Sancio, Vikas Singh, Alok Nikhil, Kamal Gupta
{"title":"Kora: A Cloud-Native Event Streaming Platform for Kafka","authors":"Anna Povzner, Prince Mahajan, Jason Gustafson, Jun Rao, Ismael Juma, Feng Min, Shriram Sridharan, Nikhil Bhatia, Gopi Attaluri, Adithya Chandra, Stanislav Kozlovski, Rajini Sivaram, Lucas Bradstreet, Bob Barrett, Dhruvil Shah, David Jacot, David Arthur, Ron Dagostino, Colin McCabe, Manikumar Reddy Obili, Kowshik Prakasam, Jose Garcia Sancio, Vikas Singh, Alok Nikhil, Kamal Gupta","doi":"10.14778/3611540.3611567","DOIUrl":"https://doi.org/10.14778/3611540.3611567","url":null,"abstract":"Event streaming is an increasingly critical infrastructure service used in many industries and there is growing demand for cloud-native solutions. Confluent Cloud provides a massive scale event streaming platform built on top of Apache Kafka with tens of thousands of clusters running in 70+ regions across AWS, Google Cloud, and Azure. This paper introduces Kora , the cloud-native platform for Apache Kafka at the core of Confluent Cloud. We describe Kora's design that enables it to meet its cloud-native goals, such as reliability, elasticity, and cost efficiency. We discuss Kora's abstractions which allow users to think in terms of their workload requirements and not the underlying infrastructure, and we discuss how Kora is designed to provide consistent, predictable performance across cloud environments with diverse capabilities.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135002986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges","authors":"Zhengtong Yan, Valter Uotila, Jiaheng Lu","doi":"10.14778/3611540.3611576","DOIUrl":"https://doi.org/10.14778/3611540.3611576","url":null,"abstract":"Join Order Selection (JOS) is a fundamental challenge in query optimization, as it significantly affects query performance. However, finding an optimal join order is an NP-hard problem due to the exponentially large search space. Despite the decades-long effort, traditional methods still suffer from limitations. Deep Reinforcement Learning (DRL) approaches have recently gained growing interest and shown superior performance over traditional methods. These DRL-based methods could leverage prior experience through the trial-and-error strategy to automatically explore the optimal join order. This tutorial will focus on recent DRL-based approaches for join order selection by providing a comprehensive overview of the various approaches. We will start by briefly introducing the core concepts of join ordering and the traditional methods for JOS. Next, we will provide some preliminary knowledge about DRL and then delve into DRL-based join order selection approaches by offering detailed information on those methods, analyzing their relationships, and summarizing their weaknesses and strengths. To help the audience gain a deeper understanding of DRL approaches for JOS, we will present two open-source demonstrations and compare their differences. Finally, we will identify research challenges and open problems to provide insights into future research directions. This tutorial will provide valuable guidance for developing more practical DRL approaches for JOS.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135002991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}