Proceedings of the Vldb Endowment最新文献

筛选
英文 中文
Full-Power Graph Querying: State of the Art and Challenges 全功率图查询:技术现状和挑战
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611577
Ioana Manolescu, Madhulika Mohanty
{"title":"Full-Power Graph Querying: State of the Art and Challenges","authors":"Ioana Manolescu, Madhulika Mohanty","doi":"10.14778/3611540.3611577","DOIUrl":"https://doi.org/10.14778/3611540.3611577","url":null,"abstract":"Graph databases are enjoying enormous popularity, through both their RDF and Property Graphs (PG) incarnations, in a variety of applications. To query graphs, query languages provide structured, as well as unstructured primitives. While structured queries allow expressing precise information needs, they are unsuited for exploring unfamiliar datasets, as they require prior knowledge of the schema and structure of the dataset. Prior research on keyword search in graph databases do not suffer from this limitation. However, keyword queries do not allow expressing precise search criteria when users do know some. This tutorial (1.5 hours) builds a continuum between structured graph querying through languages such as SPARQL and GPML, a recently proposed standard for PG querying, on one hand, and graph keyword search, on the other hand. In this space between querying and information retrieval, we analyze the features of modern query languages that go toward unstructured search, discuss their strength, limitations, and compare their computational complexity. In particular, we focus on ( i ) lessons learned from the rich literature of graph keyword search, in particular with respect to result scoring; ( ii ) language mechanisms for integrating both complex structured querying and powerful methods to search for connections users do not know in advance. We conclude by discussing the open challenges and future work directions.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136375073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Learned Query Rewrite System 一个习得的查询重写系统
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611633
Xuanhe Zhou, Guoliang Li, Jianming Wu, Jiesi Liu, Zhaoyan Sun, Xinning Zhang
{"title":"A Learned Query Rewrite System","authors":"Xuanhe Zhou, Guoliang Li, Jianming Wu, Jiesi Liu, Zhaoyan Sun, Xinning Zhang","doi":"10.14778/3611540.3611633","DOIUrl":"https://doi.org/10.14778/3611540.3611633","url":null,"abstract":"Query rewriting is a challenging task that transforms a SQL query to improve its performance while maintaining its result set. However, it is difficult to rewrite SQL queries, which often involve complex logical structures, and there are numerous candidate rewrite strategies for such queries, making it an NP-hard problem. Existing databases or query optimization engines adopt heuristics to rewrite queries, but these approaches may not be able to judiciously and adaptively apply the rewrite rules and may cause significant performance regression in some cases (e.g., correlated subqueries may not be eliminated). To address these limitations, we introduce LearnedRewrite, a query rewrite system that combines traditional and learned algorithms (i.e., Monte Carlo tree search + hybrid estimator) to rewrite queries. We have implemented the system in Calcite, and experimental results demonstrate LearnedRewrite achieves superior performance on three real datasets.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Auto-Generated Data Systems 走向自动生成数据系统
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611635
Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan
{"title":"Towards Auto-Generated Data Systems","authors":"Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan","doi":"10.14778/3611540.3611635","DOIUrl":"https://doi.org/10.14778/3611540.3611635","url":null,"abstract":"After decades of progress, database management systems (DBMSs) are now the backbones of many data applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we further argue that any automation technique must address three aspects: user specification, code generation, and result validation. We conclude by discussing a case study using videos data processing, along with opportunities for future research towards designing data systems that are automatically generated.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erica: Query Refinement for Diversity Constraint Satisfaction Erica:多样性约束满足的查询细化
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611623
Jinyang Li, Alon Silberstein, Yuval Moskovitch, Julia Stoyanovich, H. V. Jagadish
{"title":"Erica: Query Refinement for Diversity Constraint Satisfaction","authors":"Jinyang Li, Alon Silberstein, Yuval Moskovitch, Julia Stoyanovich, H. V. Jagadish","doi":"10.14778/3611540.3611623","DOIUrl":"https://doi.org/10.14778/3611540.3611623","url":null,"abstract":"Relational queries are commonly used to support decision making in critical domains like hiring and college admissions. For example, a college admissions officer may need to select a subset of the applicants for in-person interviews, who individually meet the qualification requirements (e.g., have a sufficiently high GPA) and are collectively demographically diverse (e.g., include a sufficient number of candidates of each gender and of each race). However, traditional relational queries only support selection conditions checked against each input tuple, and they do not support diversity conditions checked against multiple, possibly overlapping, groups of output tuples. To address this shortcoming, we present Erica, an interactive system that proposes minimal modifications for selection queries to have them satisfy constraints on the cardinalities of multiple groups in the result. We demonstrate the effectiveness of Erica using several real-life datasets and diversity requirements.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134996892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes XDB的实际应用:黑箱dbms的分散跨数据库查询处理
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611625
Haralampos Gavriilidis, Leonhard Rose, Joel Ziegler, Kaustubh Beedkar, Jorge-Arnulfo Quiané-Ruiz, Volker Markl
{"title":"XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes","authors":"Haralampos Gavriilidis, Leonhard Rose, Joel Ziegler, Kaustubh Beedkar, Jorge-Arnulfo Quiané-Ruiz, Volker Markl","doi":"10.14778/3611540.3611625","DOIUrl":"https://doi.org/10.14778/3611540.3611625","url":null,"abstract":"Data are naturally produced at different locations and hence stored on different DBMSes. To maximize the value of the collected data, today's users combine data from different sources. Research in data integration has proposed the Mediator-Wrapper (MW) architecture to enable ad-hoc querying processing over multiple sources. The MW approach is desirable for users, as they do not need to deal with heterogeneous data sources. However, from a query processing perspective, the MW approach is inefficient: First, one needs to provision the mediating execution engine with resources. Second, during query processing, data gets \"centralized\" within the mediating engine, which causes redundant data movement. Recently, we proposed in-situ cross-database query processing , a paradigm for federated query processing without a mediating engine. Our approach optimizes runtime performance and reduces data movement by leveraging existing systems, eliminating the need for an additional federated query engine. In this demonstration, we showcase XDB, our prototype for in-situ cross-database query processing. We demonstrate several aspects of XDB, i.e. the cross-database environment, our optimization techniques, and its decentralized execution phase.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems TPCx-AI -人工智能和机器学习系统的行业标准基准
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611554
Christoph Brücke, Philipp Härtling, Rodrigo D Escobar Palacios, Hamesh Patel, Tilmann Rabl
{"title":"TPCx-AI - An Industry Standard Benchmark for Artificial Intelligence and Machine Learning Systems","authors":"Christoph Brücke, Philipp Härtling, Rodrigo D Escobar Palacios, Hamesh Patel, Tilmann Rabl","doi":"10.14778/3611540.3611554","DOIUrl":"https://doi.org/10.14778/3611540.3611554","url":null,"abstract":"Artificial intelligence (AI) and machine learning (ML) techniques have existed for years, but new hardware trends and advances in model training and inference have radically improved their performance. With an ever increasing amount of algorithms, systems, and hardware solutions, it is challenging to identify good deployments even for experts. Researchers and industry experts have observed this challenge and have created several benchmark suites for AI and ML applications and systems. While they are helpful in comparing several aspects of AI applications, none of the existing benchmarks measures end-to-end performance of ML deployments. Many have been rigorously developed in collaboration between academia and industry, but no existing benchmark is standardized. In this paper, we introduce the TPC Express Benchmark for Artificial Intelligence (TPCx-AI), the first industry standard benchmark for end-to-end machine learning deployments. TPCx-AI is the first AI benchmark that represents the pipelines typically found in common ML and AI workloads. TPCx-AI provides a full software kit, which includes data generator, driver, and two full workload implementations, one based on Python libraries and one based on Apache Spark. We describe the complete benchmark and show benchmark results for various scale factors. TPCx-AI's core contributions are a novel unified data set covering structured and unstructured data; a fully scalable data generator that can generate realistic data from GB up to PB scale; and a diverse and representative workload using different data types and algorithms, covering a wide range of aspects of real ML workloads such as data integration, data processing, training, and inference.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134997926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Demo of QueryBooster: Supporting Middleware-Based SQL Query Rewriting as a Service QueryBooster的演示:支持基于中间件的SQL查询重写服务
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611615
Qiushi Bai, Sadeem Alsudais, Chen Li
{"title":"Demo of QueryBooster: Supporting Middleware-Based SQL Query Rewriting as a Service","authors":"Qiushi Bai, Sadeem Alsudais, Chen Li","doi":"10.14778/3611540.3611615","DOIUrl":"https://doi.org/10.14778/3611540.3611615","url":null,"abstract":"Query rewriting is an important technique to optimize SQL performance in databases. With the prevalent use of business intelligence systems and object-relational mapping frameworks, existing rewriting capabilities inside databases are insufficient to optimize machine-generated queries. In this paper, we propose a novel system called \"QueryBooster,\" to support SQL query rewriting as a cloud service. It provides a powerful and easy-to-use Web interface for users to formulate rewriting rules via a language or express rewriting intentions by providing example query pairs. It allows multiple users to share rewriting knowledge and automatically suggests shared rewriting rules for users. It requires no modifications or plugin installations to applications or databases. In this demonstration, we use real-world applications and datasets to show the user experience of QueryBooster to rewrite their application queries and share rewriting knowledge.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134998292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data and AI Model Markets: Opportunities for Data and Model Sharing, Discovery, and Integration 数据和人工智能模型市场:数据和模型共享、发现和集成的机会
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611573
Jian Pei, Raul Castro Fernandez, Xiaohui Yu
{"title":"Data and AI Model Markets: Opportunities for Data and Model Sharing, Discovery, and Integration","authors":"Jian Pei, Raul Castro Fernandez, Xiaohui Yu","doi":"10.14778/3611540.3611573","DOIUrl":"https://doi.org/10.14778/3611540.3611573","url":null,"abstract":"The markets for data and AI models are rapidly emerging and increasingly significant in the realm and the practices of data science and artificial intelligence. These markets are being studied from diverse perspectives, such as e-commerce, economics, machine learning, and data management. In light of these developments, there is a pressing need to present a comprehensive and forward-looking survey on the subject to the database and data management community. In this tutorial, we aim to provide a comprehensive and interdisciplinary introduction to data and AI model markets. Unlike a few recent surveys and tutorials that concentrate only on the economics aspect, we take a novel perspective and examine data and AI model markets as grand opportunities to address the long-standing problem of data and model sharing, discovery, and integration. We motivate the importance of data and model markets using practical examples, present the current industry landscape of such markets, and explore the modules and options of such markets from multiple dimensions, including assets in the markets (e.g., data versus models), platforms, and participants. Furthermore, we summarize the latest advancements and examine the future directions of data and AI model markets as mechanisms for enabling and facilitating sharing, discovery, and integration.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Showcasing Data Management Challenges for Future IoT Applications with NebulaStream 利用星云流展示未来物联网应用的数据管理挑战
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611588
Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Philipp M. Grulich, Ankit Chaudhary, Steffen Zeuch, Volker Markl
{"title":"Showcasing Data Management Challenges for Future IoT Applications with NebulaStream","authors":"Aljoscha Lepping, Hoang Mi Pham, Laura Mons, Balint Rueb, Philipp M. Grulich, Ankit Chaudhary, Steffen Zeuch, Volker Markl","doi":"10.14778/3611540.3611588","DOIUrl":"https://doi.org/10.14778/3611540.3611588","url":null,"abstract":"Data management systems will face several new challenges in supporting IoT applications during the coming years. These challenges arise from managing large numbers of heterogeneous IoT devices and require combining elastic cloud and fog resources in unified fog-cloud environments. In this demonstration, we introduce a smart city simulation called IoTropolis and use it to create interactive eHealth and Smart Grid application scenarios. We use these scenarios to showcase three key challenges of unified fog-cloud environments. Furthermore, we demonstrate how our recently proposed data management system for the IoT NebulaStream addresses these challenges. Visitors to our demonstration can configure and interact with the scenarios to manage electricity usage in IoTropolis or to distribute patients across different hospitals. Thereby, visitors can actively engage with the challenges showcased by IoTropolis and utilize NebulaStream to address them. As a result, our demonstration enables visitors to experience data management for future IoT applications.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cornet: Learning Spreadsheet Formatting Rules by Example 通过示例学习电子表格格式规则
3区 计算机科学
Proceedings of the Vldb Endowment Pub Date : 2023-08-01 DOI: 10.14778/3611540.3611620
Mukul Singh, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen
{"title":"Cornet: Learning Spreadsheet Formatting Rules by Example","authors":"Mukul Singh, José Cambronero Sanchez, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen","doi":"10.14778/3611540.3611620","DOIUrl":"https://doi.org/10.14778/3611540.3611620","url":null,"abstract":"Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as \"color red all entries in a column that are negative\" or \"bold all rows not containing error or failure\". Unfortunately, users who want to exercise this functionality need to manually write these conditional formatting (CF) rules. We introduce Cornet, a system that automatically learns such conditional formatting rules from user examples. Cornet takes inspiration from inductive program synthesis and combines symbolic rule enumeration, based on semi-supervised clustering and iterative decision tree learning, with a neural ranker to produce accurate conditional formatting rules. In this demonstration, we show Cornet in action as a simple add-in to Microsoft's Excel. After the user provides one or two formatted cells as examples, Cornet generates formatting rule suggestions for the user to apply to the spreadsheet.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135003655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信