Proc. VLDB Endow.最新文献_第4页

Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals 为什么不:修改对个人不公平的前k排名

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598606

Zixuan Chen, P. Manolios, Mirek Riedewald

{"title":"Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals","authors":"Zixuan Chen, P. Manolios, Mirek Riedewald","doi":"10.14778/3598581.3598606","DOIUrl":"https://doi.org/10.14778/3598581.3598606","url":null,"abstract":"This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair to certain individuals or organizations, triggering the question why some entity of interest has not yet shown up in the top-k. We introduce various notions of such why-not-yet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest. While some why-not-yet problems have linear constraints, others require quantifiers, disjunction, and negation. We propose several optimizations, ranging from a monotonic-core construction that approximates the complex constraints with a conjunction of linear ones, to various techniques that let the user control the tradeoff between running time and approximation quality. Experiments with real and synthetic data demonstrate the practicality and scalability of our technique, showing its superiority compared to the state of the art (SOA).","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"16 1","pages":"2377-2390"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78534442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LEON: A New Framework for ML-Aided Query Optimization 一个新的机器学习辅助查询优化框架

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598597

Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, Kai Zheng

{"title":"LEON: A New Framework for ML-Aided Query Optimization","authors":"Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, Kai Zheng","doi":"10.14778/3598581.3598597","DOIUrl":"https://doi.org/10.14778/3598581.3598597","url":null,"abstract":"\u0000 Query optimization has long been a fundamental yet challenging topic in the database field. With the prosperity of machine learning (ML), some recent works have shown the advantages of reinforcement learning (RL) based learned query optimizer. However, they suffer from fundamental limitations due to the data-driven nature of ML. Motivated by the ML characteristics and database maturity, we propose\u0000 LEON\u0000 -a framework for ML-aidEd query OptimizatioN.\u0000 LEON\u0000 improves the expert query optimizer to self-adjust to the particular deployment by leveraging ML and the fundamental knowledge in the expert query optimizer. To train the ML model, a pairwise ranking objective is proposed, which is substantially different from the previous regression objective. To help the optimizer to escape the local minima and avoid failure, a ranking and uncertainty-based exploration strategy is proposed, which discovers the valuable plans to aid the optimizer. Furthermore, an ML model-guided pruning is proposed to increase the planning efficiency without hurting too much performance. Extensive experiments offer evidence that the proposed framework can outperform the state-of-the-art methods in terms of end-to-end latency performance, training efficiency, and stability.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"1 1","pages":"2261-2273"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72862331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

BICE: Exploring Compact Search Space by Using Bipartite Matching and Cell-Wide Verification 利用二部匹配和单元范围验证探索紧凑搜索空间

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598591

Yunyoung Choi, Kunsoo Park, Hyunjoon Kim

引用次数: 0

Text Indexing for Long Patterns: Anchors are All you Need 长模式的文本索引:锚是你所需要的

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598586

Lorraine A. K. Ayad, G. Loukides, S. Pissis

{"title":"Text Indexing for Long Patterns: Anchors are All you Need","authors":"Lorraine A. K. Ayad, G. Loukides, S. Pissis","doi":"10.14778/3598581.3598586","DOIUrl":"https://doi.org/10.14778/3598581.3598586","url":null,"abstract":"\u0000 In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to\u0000 simultaneously\u0000 enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are:\u0000 (i)\u0000 index space;\u0000 (ii)\u0000 query time;\u0000 (iii)\u0000 construction space; and\u0000 (iv)\u0000 construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound\u0000 l\u0000 on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by:\u0000 (i)\u0000 designing an average-case linear-time algorithm to compute bd-anchors; and\u0000 (ii)\u0000 developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"34 1","pages":"2117-2131"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91242008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Maximal D-truss Search in Dynamic Directed Graphs 动态有向图中的最大d -桁架搜索

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598592

Anxin Tian, Alexander Zhou, Yue Wang, Lei Chen

引用次数: 1

Pando: Enhanced Data Skipping with Logical Data Partitioning Pando:增强数据跳跃与逻辑数据分区

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598601

Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden

{"title":"Pando: Enhanced Data Skipping with Logical Data Partitioning","authors":"Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden","doi":"10.14778/3598581.3598601","DOIUrl":"https://doi.org/10.14778/3598581.3598601","url":null,"abstract":"With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"20 1","pages":"2316-2329"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75873552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WiscSort: External Sorting For Byte-Addressable Storage wisscsort:字节可寻址存储的外部排序

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598585

Vinay Banakar, Kan Wu, Yuvraj Patel, K. Keeton, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

引用次数: 1

Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules 数据库量子加速的机会:查询和事务调度的优化

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598603

Umut Çalikyilmaz, Sven Groppe, Jinghua Groppe, Tobias Winker, S. Prestel, Farida Shagieva, Daanish Arya, F. Preis, L. Gruenwald

引用次数: 5

SEIDEN: Revisiting Query Processing in Video Database Systems 视频数据库系统中的查询处理

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598599

J. Bang, Gaurav Tarlok Kakkar, Pramod Chunduri, Subrata Mitra, Joy Arulraj

{"title":"SEIDEN: Revisiting Query Processing in Video Database Systems","authors":"J. Bang, Gaurav Tarlok Kakkar, Pramod Chunduri, Subrata Mitra, Joy Arulraj","doi":"10.14778/3598581.3598599","DOIUrl":"https://doi.org/10.14778/3598581.3598599","url":null,"abstract":"State-of-the-art video database management systems (VDBMSs) often use lightweight proxy models to accelerate object retrieval and aggregate queries. The key assumption underlying these systems is that the proxy model is an order of magnitude faster than the heavyweight oracle model. However, recent advances in computer vision have invalidated this assumption. Inference time of recently proposed oracle models is on par with or even lower than the proxy models used in state-of-the-art (SoTA) VDBMSs. This paper presents Seiden, a VDBMS that leverages this radical shift in the runtime gap between the oracle and proxy models. Instead of relying on a proxy model, Seiden directly applies the oracle model over a subset of frames to build a query-agnostic index, and samples additional frames to answer the query using an exploration-exploitation scheme during query processing. By leveraging the temporal continuity of the video and the output of the oracle model on the sampled frames, Seiden delivers faster query processing and better query accuracy than SoTA VDBMSs. Our empirical evaluation shows that Seiden is on average 6.6 x faster than SoTA VDBMSs across diverse queries and datasets.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"85 1","pages":"2289-2301"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75829131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VeriBench: Analyzing the Performance of Database Systems with Verifiability VeriBench:分析具有可验证性的数据库系统的性能

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI: 10.14778/3598581.3598588

Cong Yue, Meihui Zhang, Changhao Zhu, Gang Chen, Dumitrel Loghin, B. Ooi

{"title":"VeriBench: Analyzing the Performance of Database Systems with Verifiability","authors":"Cong Yue, Meihui Zhang, Changhao Zhu, Gang Chen, Dumitrel Loghin, B. Ooi","doi":"10.14778/3598581.3598588","DOIUrl":"https://doi.org/10.14778/3598581.3598588","url":null,"abstract":"\u0000 Database systems are paying more attention to data security in recent years. Immutable systems such as blockchains, verifiable databases, and ledger databases are equipped with various verifiability mechanisms to protect data. Such systems often adopt different threat models, and techniques, therefore, have different performance implications compared to traditional database systems. So far, there is no uniform benchmarking tool for evaluating the performance of these systems, especially at the level of verification functions. In this paper, we first survey the design space of the\u0000 verifiability-enabled database systems\u0000 along five dimensions: threat model, authenticated data structure (ADS), query processing, verification, and auditing. Based on this survey, we design and implement VeriBench, a benchmark framework for\u0000 verifiability-enabled database systems.\u0000 VeriBench enables a fair comparison of systems designed with different underlying technologies that share the client-side verification scheme, and focuses on design space exploration to provide a deeper understanding of different system design choices. VeriBench incorporates micro- and macro-benchmarks to provide a comprehensive evaluation. Further, VeriBench is designed to enable easy extension for benchmarking new systems and workloads. We run VeriBench to conduct a comprehensive analysis of state-of-the-art systems comprising blockchains, ledger databases, and log transparency technologies. The results expose the weaknesses and strengths of each underlying design choice, and the insights should serve as guidance for future development.\u0000","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"10 1","pages":"2145-2157"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78529444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0