AQUA: Automatic Collaborative Query Processing in Analytical Database

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI:10.14778/3611540.3611607

Yuchen Peng, Ke Chen, Lidan Shou, Dawei Jiang, Gang Chen

{"title":"AQUA: Automatic Collaborative Query Processing in Analytical Database","authors":"Yuchen Peng, Ke Chen, Lidan Shou, Dawei Jiang, Gang Chen","doi":"10.14778/3611540.3611607","DOIUrl":null,"url":null,"abstract":"Data analysts nowadays are keen to have analytical capabilities involving deep learning (DL). Collaborative queries, which employ relational operations to process structured data and DL models to process unstructured data, provide a powerful facility for DL-based in-database analysis. The classical approach to support collaborative queries in relational databases is to integrate DL models with user-defined functions (UDFs) in a general-purpose language (e.g., C++) to process unstructured data. This approach suffers from suboptimal performance as the opaque UDFs preclude the generation of an optimal query plan. A recent work, DL2SQL, addresses the problem of collaborative query optimization by first converting DL computations into SQL subqueries and then using a classical relational query optimizer to optimize the entire collaborative query. However, the DL2SQL approach compromises usability by requiring data analysts to manually manage DL-related data and tune query performance. To this end, this paper introduces AQUA, an analytical database designed for efficient collaborative query processing. Built on DL2SQL, AQUA automates translations from collaborative queries into SQL queries. To enhance usability, AQUA introduces two techniques: 1) a declarative scheme for DL-related data management, and 2) DL-specific optimizations for collaborative query processing, eliminating the burden of manual data management and performance tuning from the data analysts. We demonstrate the key contributions of AQUA via a web APP that allows the audience to perform collaborative queries on the CIFAR-10 dataset.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":3.3000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611607","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Data analysts nowadays are keen to have analytical capabilities involving deep learning (DL). Collaborative queries, which employ relational operations to process structured data and DL models to process unstructured data, provide a powerful facility for DL-based in-database analysis. The classical approach to support collaborative queries in relational databases is to integrate DL models with user-defined functions (UDFs) in a general-purpose language (e.g., C++) to process unstructured data. This approach suffers from suboptimal performance as the opaque UDFs preclude the generation of an optimal query plan. A recent work, DL2SQL, addresses the problem of collaborative query optimization by first converting DL computations into SQL subqueries and then using a classical relational query optimizer to optimize the entire collaborative query. However, the DL2SQL approach compromises usability by requiring data analysts to manually manage DL-related data and tune query performance. To this end, this paper introduces AQUA, an analytical database designed for efficient collaborative query processing. Built on DL2SQL, AQUA automates translations from collaborative queries into SQL queries. To enhance usability, AQUA introduces two techniques: 1) a declarative scheme for DL-related data management, and 2) DL-specific optimizations for collaborative query processing, eliminating the burden of manual data management and performance tuning from the data analysts. We demonstrate the key contributions of AQUA via a web APP that allows the audience to perform collaborative queries on the CIFAR-10 dataset.

查看原文本刊更多论文

分析数据库中的自动协同查询处理

如今，数据分析师渴望拥有涉及深度学习(DL)的分析能力。协作查询使用关系操作处理结构化数据，使用DL模型处理非结构化数据，为基于DL的数据库内分析提供了强大的工具。在关系数据库中支持协作查询的经典方法是使用通用语言(例如c++)将深度学习模型与用户定义函数(udf)集成在一起，以处理非结构化数据。这种方法的性能不是最优的，因为不透明的udf排除了最优查询计划的生成。最近的一项工作，DL2SQL，通过首先将DL计算转换为SQL子查询，然后使用经典的关系查询优化器来优化整个协作查询，解决了协作查询优化问题。但是，DL2SQL方法要求数据分析人员手动管理与dl相关的数据并调优查询性能，从而损害了可用性。为此，本文介绍了AQUA，一个为高效协同查询处理而设计的分析数据库。AQUA建立在DL2SQL之上，可以自动将协作查询转换为SQL查询。为了增强可用性，AQUA引入了两种技术:1)用于与dl相关的数据管理的声明式方案，以及2)用于协作查询处理的特定于dl的优化，从而消除了数据分析师手动数据管理和性能调优的负担。我们通过一个web应用程序演示了AQUA的关键贡献，该应用程序允许观众在CIFAR-10数据集上执行协作查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Vldb Endowment Computer Science-General Computer Science

CiteScore

7.70

自引率

0.00%

发文量

期刊介绍： The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.