arXiv - CS - Databases最新文献_第3页

Intelligent Transaction Scheduling via Conflict Prediction in OLTP DBMS 通过 OLTP DBMS 中的冲突预测实现智能事务调度

arXiv - CS - Databases Pub Date : 2024-09-03 DOI: arxiv-2409.01675

Tieying Zhang, Anthony Tomasic, Andrew Pavlo

{"title":"Intelligent Transaction Scheduling via Conflict Prediction in OLTP DBMS","authors":"Tieying Zhang, Anthony Tomasic, Andrew Pavlo","doi":"arxiv-2409.01675","DOIUrl":"https://doi.org/arxiv-2409.01675","url":null,"abstract":"Current architectures for main-memory online transaction processing (OLTP)\u0000database management systems (DBMS) typically use random scheduling to assign\u0000transactions to threads. This approach achieves uniform load across threads but\u0000it ignores the likelihood of conflicts between transactions. If the DBMS could\u0000estimate the potential for transaction conflict and then intelligently schedule\u0000transactions to avoid conflicts, then the system could improve its performance.\u0000Such estimation of transaction conflict, however, is non-trivial for several\u0000reasons. First, conflicts occur under complex conditions that are far removed\u0000in time from the scheduling decision. Second, transactions must be represented\u0000in a compact and efficient manner to allow for fast conflict detection. Third,\u0000given some evidence of potential conflict, the DBMS must schedule transactions\u0000in such a way that minimizes this conflict. In this paper, we systematically\u0000explore the design decisions for solving these problems. We then empirically\u0000measure the performance impact of different representations on standard OLTP\u0000benchmarks. Our results show that intelligent scheduling using a history\u0000increases throughput by $sim$40% on 20-core machine.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computing Range Consistent Answers to Aggregation Queries via Rewriting 通过重写计算聚合查询的范围一致答案

arXiv - CS - Databases Pub Date : 2024-09-03 DOI: arxiv-2409.01648

Aziz Amezian El Khalfioui, Jef Wijsen

引用次数: 0

SpannerLib: Embedding Declarative Information Extraction in an Imperative Workflow SpannerLib：在命令式工作流中嵌入声明式信息提取

arXiv - CS - Databases Pub Date : 2024-09-03 DOI: arxiv-2409.01736

Dean Light, Ahmad Aiashy, Mahmoud Diab, Daniel Nachmias, Stijn Vansummeren, Benny Kimelfeld

引用次数: 0

Multilevel Verification on a Single Digital Decentralized Distributed (DDD) Ledger 在单一数字去中心化分布式（DDD）账本上进行多级验证

arXiv - CS - Databases Pub Date : 2024-09-03 DOI: arxiv-2409.11410

Ayush Thada, Aanchal Kandpal, Dipanwita Sinha Mukharjee

引用次数: 0

BEAVER: An Enterprise Benchmark for Text-to-SQL BEAVER：文本到 SQL 的企业基准

arXiv - CS - Databases Pub Date : 2024-09-03 DOI: arxiv-2409.02038

Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael Cafarella, Çağatay Demiralp, Michael Stonebraker

{"title":"BEAVER: An Enterprise Benchmark for Text-to-SQL","authors":"Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael Cafarella, Çağatay Demiralp, Michael Stonebraker","doi":"arxiv-2409.02038","DOIUrl":"https://doi.org/arxiv-2409.02038","url":null,"abstract":"Existing text-to-SQL benchmarks have largely been constructed using publicly\u0000available tables from the web with human-generated tests containing question\u0000and SQL statement pairs. They typically show very good results and lead people\u0000to think that LLMs are effective at text-to-SQL tasks. In this paper, we apply\u0000off-the-shelf LLMs to a benchmark containing enterprise data warehouse data. In\u0000this environment, LLMs perform poorly, even when standard prompt engineering\u0000and RAG techniques are utilized. As we will show, the reasons for poor\u0000performance are largely due to three characteristics: (1) public LLMs cannot\u0000train on enterprise data warehouses because they are largely in the \"dark web\",\u0000(2) schemas of enterprise tables are more complex than the schemas in public\u0000data, which leads the SQL-generation task innately harder, and (3)\u0000business-oriented questions are often more complex, requiring joins over\u0000multiple tables and aggregations. As a result, we propose a new dataset BEAVER,\u0000sourced from real enterprise data warehouses together with natural language\u0000queries and their correct SQL statements which we collected from actual user\u0000history. We evaluated this dataset using recent LLMs and demonstrated their\u0000poor performance on this task. We hope this dataset will facilitate future\u0000researchers building more sophisticated text-to-SQL systems which can do better\u0000on this important class of data.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Split Learning-based Privacy-Preserving Record Linkage 实现基于拆分学习的隐私保护记录链接

arXiv - CS - Databases Pub Date : 2024-09-02 DOI: arxiv-2409.01088

Michail Zervas, Alexandros Karakasidis

引用次数: 0

GQL and SQL/PGQ: Theoretical Models and Expressive Power GQL 和 SQL/PGQ：理论模型和表达能力

arXiv - CS - Databases Pub Date : 2024-09-02 DOI: arxiv-2409.01102

Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, Alexandra Rogova

{"title":"GQL and SQL/PGQ: Theoretical Models and Expressive Power","authors":"Amélie Gheerbrant, Leonid Libkin, Liat Peterfreund, Alexandra Rogova","doi":"arxiv-2409.01102","DOIUrl":"https://doi.org/arxiv-2409.01102","url":null,"abstract":"SQL/PGQ and GQL are very recent international standards for querying property\u0000graphs: SQL/PGQ specifies how to query relational representations of property\u0000graphs in SQL, while GQL is a standalone language for graph databases. The\u0000rapid industrial development of these standards left the academic community\u0000trailing in its wake. While digests of the languages have appeared, we do not\u0000yet have concise foundational models like relational algebra and calculus for\u0000relational databases that enable the formal study of languages, including their\u0000expressiveness and limitations. At the same time, work on the next versions of\u0000the standards has already begun, to address the perceived limitations of their\u0000first versions. Motivated by this, we initiate a formal study of SQL/PGQ and GQL,\u0000concentrating on their concise formal model and expressiveness. For the former,\u0000we define simple core languages -- Core GQL and Core PGQ -- that capture the\u0000essence of the new standards, are amenable to theoretical analysis, and fully\u0000clarify the difference between PGQ's bottom up evaluation versus GQL's linear,\u0000or pipelined approach. Equipped with these models, we both confirm the\u0000necessity to extend the language to fill in the expressiveness gaps and\u0000identify the source of these deficiencies. We complement our theoretical\u0000analysis with an experimental study, demonstrating that existing workarounds in\u0000full GQL and PGQ are impractical which further underscores the necessity to\u0000correct deficiencies in the language design.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Serverless Query Processing with Flexible Performance SLAs and Prices 具有灵活性能 SLA 和价格的无服务器查询处理

arXiv - CS - Databases Pub Date : 2024-09-02 DOI: arxiv-2409.01388

Haoqiong Bian, Dongyang Geng, Yunpeng Chai, Anastasia Ailamaki

{"title":"Serverless Query Processing with Flexible Performance SLAs and Prices","authors":"Haoqiong Bian, Dongyang Geng, Yunpeng Chai, Anastasia Ailamaki","doi":"arxiv-2409.01388","DOIUrl":"https://doi.org/arxiv-2409.01388","url":null,"abstract":"Serverless query processing has become increasingly popular due to its\u0000auto-scaling, high elasticity, and pay-as-you-go pricing. It allows cloud data\u0000warehouse (or lakehouse) users to focus on data analysis without the burden of\u0000managing systems and resources. Accordingly, in serverless query services,\u0000users become more concerned about cost-efficiency under acceptable performance\u0000than performance under fixed resources. This poses new challenges for\u0000serverless query engine design in providing flexible performance service-level\u0000agreements (SLAs) and cost-efficiency (i.e., prices). In this paper, we first define the problem of flexible performance SLAs and\u0000prices in serverless query processing and discuss its significance. Then, we\u0000envision the challenges and solutions for solving this problem and the\u0000opportunities it raises for other database research. Finally, we present\u0000PixelsDB, an open-source prototype with three service levels supported by\u0000dedicated architectural designs. Evaluations show that PixelsDB reduces\u0000resource costs by 65.5% for near-real-world workloads generated by Cloud\u0000Analytics Benchmark (CAB) while not violating the pending time guarantees.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach 利用基于规则的可达性方法优化传感器数据的遍历查询

arXiv - CS - Databases Pub Date : 2024-08-30 DOI: arxiv-2408.17157

Bryan-Elliott Tam, Ruben Taelman, Julián Rojas Meléndez, Pieter Colpaert

{"title":"Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach","authors":"Bryan-Elliott Tam, Ruben Taelman, Julián Rojas Meléndez, Pieter Colpaert","doi":"arxiv-2408.17157","DOIUrl":"https://doi.org/arxiv-2408.17157","url":null,"abstract":"Link Traversal queries face challenges in completeness and long execution\u0000time due to the size of the web. Reachability criteria define completeness by\u0000restricting the links followed by engines. However, the number of links to\u0000dereference remains the bottleneck of the approach. Web environments often have\u0000structures exploitable by query engines to prune irrelevant sources. Current\u0000criteria rely on using information from the query definition and predefined\u0000predicate. However, it is difficult to use them to traverse environments where\u0000logical expressions indicate the location of resources. We propose to use a\u0000rule-based reachability criterion that captures logical statements expressed in\u0000hypermedia descriptions within linked data documents to prune irrelevant\u0000sources. In this poster paper, we show how the Comunica link traversal engine\u0000is modified to take hints from a hypermedia control vocabulary, to prune\u0000irrelevant sources. Our preliminary findings show that by using this strategy,\u0000the query engine can significantly reduce the number of HTTP requests and the\u0000query execution time without sacrificing the completeness of results. Our work\u0000shows that the investigation of hypermedia controls in link pruning of\u0000traversal queries is a worthy effort for optimizing web queries of unindexed\u0000decentralized databases.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empowering Open Data Sharing for Social Good: A Privacy-Aware Approach 增强开放数据共享的社会效益：注重隐私的方法

arXiv - CS - Databases Pub Date : 2024-08-30 DOI: arxiv-2408.17378

Tânia Carvalho, Luís Antunes, Cristina Costa, Nuno Moniz

{"title":"Empowering Open Data Sharing for Social Good: A Privacy-Aware Approach","authors":"Tânia Carvalho, Luís Antunes, Cristina Costa, Nuno Moniz","doi":"arxiv-2408.17378","DOIUrl":"https://doi.org/arxiv-2408.17378","url":null,"abstract":"The Covid-19 pandemic has affected the world at multiple levels. Data sharing\u0000was pivotal for advancing research to understand the underlying causes and\u0000implement effective containment strategies. In response, many countries have\u0000promoted the availability of daily cases to support research initiatives,\u0000fostering collaboration between organisations and making such data available to\u0000the public through open data platforms. Despite the several advantages of data\u0000sharing, one of the major concerns before releasing health data is its impact\u0000on individuals' privacy. Such a sharing process should be based on\u0000state-of-the-art methods in Data Protection by Design and by Default. In this\u0000paper, we use a data set related to Covid-19 cases in the second largest\u0000hospital in Portugal to show how it is feasible to ensure data privacy while\u0000improving the quality and maintaining the utility of the data. Our goal is to\u0000demonstrate how knowledge exchange in multidisciplinary teams of healthcare\u0000practitioners, data privacy, and data science experts is crucial to\u0000co-developing strategies that ensure high utility of de-identified data.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0