The VLDB Journal最新文献

筛选
英文 中文
Flexible grouping of linear segments for highly accurate lossy compression of time series data 灵活分组线性片段,高精度有损压缩时间序列数据
The VLDB Journal Pub Date : 2024-07-15 DOI: 10.1007/s00778-024-00862-z
Xenophon Kitsios, Panagiotis Liakos, Katia Papakonstantinopoulou, Y. Kotidis
{"title":"Flexible grouping of linear segments for highly accurate lossy compression of time series data","authors":"Xenophon Kitsios, Panagiotis Liakos, Katia Papakonstantinopoulou, Y. Kotidis","doi":"10.1007/s00778-024-00862-z","DOIUrl":"https://doi.org/10.1007/s00778-024-00862-z","url":null,"abstract":"","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"51 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141647256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs FlexpushdownDB:重新思考云 OLAP DBMS 的计算下推问题
The VLDB Journal Pub Date : 2024-07-10 DOI: 10.1007/s00778-024-00867-8
Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker
{"title":"FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs","authors":"Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker","doi":"10.1007/s00778-024-00867-8","DOIUrl":"https://doi.org/10.1007/s00778-024-00867-8","url":null,"abstract":"<p>Modern cloud-native OLAP databases adopt a <i>storage-disaggregation</i> architecture that separates the management of computation and storage. A major bottleneck in such an architecture is the network connecting the computation and storage layers. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. This paper presents <i>FlexPushdownDB</i> (<i>FPDB</i>), where we revisit the design of computation pushdown in a storage-disaggregation architecture, and then introduce several optimizations to further accelerate query processing. First, FPDB supports <i>hybrid query execution</i>, which combines local computation on cached data and computation pushdown to cloud storage at a fine granularity. Within the cache, FPDB uses a novel <i>Weighted-LFU</i> cache replacement policy that takes into account the cost of pushdown computation. Second, we design <i>adaptive pushdown</i> as a new mechanism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Finally, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems, and further propose two new pushdown operators, namely, <i>selection bitmap</i> and <i>distributed data shuffle</i>. Evaluation on SSB and TPC-H shows each optimization can improve the performance by 2.2<span>(times )</span>, 1.9<span>(times )</span>, and 3<span>(times )</span> respectively.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"205 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open benchmark for filtering techniques in entity resolution 实体解析中过滤技术的公开基准
The VLDB Journal Pub Date : 2024-07-09 DOI: 10.1007/s00778-024-00868-7
Franziska Neuhof, Marco Fisichella, George Papadakis, Konstantinos Nikoletos, Nikolaus Augsten, Wolfgang Nejdl, Manolis Koubarakis
{"title":"Open benchmark for filtering techniques in entity resolution","authors":"Franziska Neuhof, Marco Fisichella, George Papadakis, Konstantinos Nikoletos, Nikolaus Augsten, Wolfgang Nejdl, Manolis Koubarakis","doi":"10.1007/s00778-024-00868-7","DOIUrl":"https://doi.org/10.1007/s00778-024-00868-7","url":null,"abstract":"<p>Entity Resolution identifies entity profiles that represent the same real-world object. A brute-force approach that considers all pairs of entities suffers from quadratic time complexity. To ameliorate this issue, filtering techniques reduce the search space to highly similar and, thus, highly likely matches. Such techniques come in two forms: (i) <i>blocking workflows</i> group together entity profiles with identical or similar signatures, and (ii) <i>nearest-neighbor workflows</i> convert all entity profiles into vectors and detect the ones closest to every query entity. The main techniques of these two types have never been juxtaposed in a systematic way and, thus, their relative performance is unknown. To cover this gap, we perform an extensive experimental study that investigates the relative performance of the main representatives per type over numerous established datasets. Comparing techniques of different types in a fair way is a non-trivial task, because the configuration parameters of each approach have a significant impact on its performance, but are hard to fine-tune. We consider a plethora of parameter configurations per methods, optimizing each workflow with respect to recall and precision in both schema-agnostic and schema-aware settings. The experimental results provide novel insights into the effectiveness, the time efficiency, the memory footprint, and the scalability of the considered techniques.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"83 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimum motif-cut: a workload-aware RDF graph partitioning strategy 最小图案切割:一种工作量感知的 RDF 图分割策略
The VLDB Journal Pub Date : 2024-07-08 DOI: 10.1007/s00778-024-00860-1
Peng Peng, Shengyi Ji, M. Tamer Özsu, Lei Zou
{"title":"Minimum motif-cut: a workload-aware RDF graph partitioning strategy","authors":"Peng Peng, Shengyi Ji, M. Tamer Özsu, Lei Zou","doi":"10.1007/s00778-024-00860-1","DOIUrl":"https://doi.org/10.1007/s00778-024-00860-1","url":null,"abstract":"<p>In designing a distributed RDF system, it is quite common to divide an RDF graph into subgraphs, called <i>partitions</i>, which are then distributed. Graph partitioning in general and RDF graph partitioning in particular are challenging problems. In this paper, we propose an RDF graph partitioning approach, called <i>M</i>inimum <i>M</i>otif-<i>C</i>ut (MMC for short) to maximize the number of SPARQL queries in a workload that can be evaluated within one partition without interpartition joins. The motif is a common structure that occurs in queries. We prove that MMC partitioning problem is NP-complete and propose two greedy heuristic algorithms to solve it. One algorithm is basic, while the other is more advanced and optimized for data localization. A query is decomposed into a set of independently evaluatable subqueries based on RDF graph partitioning. The subqueries are executed in a distributed fashion and the results are assembled for the final result. Extensive experiments over synthetic and real RDF graphs and their corresponding logs show that the proposed technique can significantly avoid interpartition joins and results in good performance.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-based butterfly counting 基于 GPU 的蝴蝶计数
The VLDB Journal Pub Date : 2024-06-27 DOI: 10.1007/s00778-024-00861-0
Yifei Xia, Feng Zhang, Qingyu Xu, Mingde Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, Bingsheng He, Siqi Ma
{"title":"GPU-based butterfly counting","authors":"Yifei Xia, Feng Zhang, Qingyu Xu, Mingde Zhang, Zhiming Yao, Lv Lu, Xiaoyong Du, Dong Deng, Bingsheng He, Siqi Ma","doi":"10.1007/s00778-024-00861-0","DOIUrl":"https://doi.org/10.1007/s00778-024-00861-0","url":null,"abstract":"<p>When dealing with large bipartite graphs, butterfly counting is a crucial and time-consuming operation. Graphics processing units (GPUs) are widely used parallel heterogeneous devices that can significantly boost performance for data science programs. However, currently no work enables efficient butterfly counting on GPU. To fill this gap, we propose a GPU-based butterfly counting method, called G-BFC. G-BFC solves three significant technical problems. First, butterfly counting involves massive serial operations, which leads to severe synchronization overheads and performance degradation. We unlock the serial region and utilize the shared memory on GPU to efficiently handle it. Second, butterfly counting on GPU faces the workload imbalance problem. To maximize efficiency, we develop a novel adaptive strategy to balance the workload among threads. Third, the large number of two-hop paths, also known as wedges, in bipartite graphs make parallel butterfly counting difficult to traverse. We develop an innovative preprocessing strategy that can significantly cut down on the required number of wedges. We conduct comprehensive experiments on both server-grade and edge-grade GPU platforms, and experiments show that G-BFC brings significant performance benefits. G-BFC achieves 4.84<span>(times )</span> performance speedup over the state-of-the-art solution on eleven real-world datasets.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing LSM-based indexes for disaggregated memory 为分解内存优化基于 LSM 的索引
The VLDB Journal Pub Date : 2024-06-19 DOI: 10.1007/s00778-024-00863-y
Ruihong Wang, Chuqing Gao, Jianguo Wang, Prishita Kadam, M. TamerÖzsu, Walid G. Aref
{"title":"Optimizing LSM-based indexes for disaggregated memory","authors":"Ruihong Wang, Chuqing Gao, Jianguo Wang, Prishita Kadam, M. TamerÖzsu, Walid G. Aref","doi":"10.1007/s00778-024-00863-y","DOIUrl":"https://doi.org/10.1007/s00778-024-00863-y","url":null,"abstract":"<p>The emerging trend of memory disaggregation where CPU and memory are physically separated from each other and are connected via ultra-fast networking, e.g., over Remote Direct Memory Access (RDMA), allows elastic and independent scaling of compute (CPU) and main memory. This paper investigates how indexing can be efficiently designed in the memory disaggregated architecture. Although existing research has optimized the B-tree for this new architecture, its performance is unsatisfactory. This paper focuses on LSM-based indexing and proposes <span>dLSM</span>,the first highly optimized LSM-tree for <u>d</u>isaggregated memory. <span>dLSM</span> introduces a suite of optimizations including reducing software overhead, leveraging near-data computing, tuning for byte-addressability, and an instantiation over RDMA as a case study with RDMA-specific customizations to improve system performance. Experiments illustrate that <span>dLSM</span> achieves 2.3<span>(times )</span> to 11.6<span>(times )</span> higher write throughput than running the optimized B-tree and four adaptations of existing LSM-tree indexes over disaggregated memory. <span>dLSM</span> is written in C++ (with approximately 54,400 LOC), and is open-sourced.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performant almost-latch-free data structures using epoch protection in more depth 更深入地了解使用划时代保护的高性能几乎无锁存器数据结构
The VLDB Journal Pub Date : 2024-06-17 DOI: 10.1007/s00778-024-00859-8
Tianyu Li, Badrish Chandramouli, Samuel Madden
{"title":"Performant almost-latch-free data structures using epoch protection in more depth","authors":"Tianyu Li, Badrish Chandramouli, Samuel Madden","doi":"10.1007/s00778-024-00859-8","DOIUrl":"https://doi.org/10.1007/s00778-024-00859-8","url":null,"abstract":"<p>Multi-core scalability presents a major implementation challenge for data system designers today. Traditional methods such as latching no longer scale in today’s highly parallel architectures. While the designer can make use of techniques such as latch-free programming to painstakingly design specialized, highly-performant solutions, such solutions are often intricate to build and difficult to reason about. Of particular interest to data system designers is a class of data structures we call <i>almost-latch-free</i>; such data structures can be made scalable in the common case, but have rare complications (e.g., dynamic resizing) that prevent full latch-free implementations. In this work, we present a new programming framework called Epoch-Protected Version Scheme (EPVS) to make it easy to build such data structures. EPVS makes use of <i>epoch protection</i> to preserve performance in the common case of latch-free operations, while allowing users to specify critical sections that execute under mutual exclusion for the rare, non-latch-free operations. We showcase the use of EPVS-based concurrency primitives in a few practical systems to demonstrate its competitive performance and intuitive guarantees. EPVS is available in open source as part of Microsoft’s FASTER project (Epoch Protected Version Scheme (source code) 2022; Microsoft FASTER 2022).</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on hybrid transactional and analytical processing 关于混合事务处理和分析处理的调查
The VLDB Journal Pub Date : 2024-06-04 DOI: 10.1007/s00778-024-00858-9
Haoze Song, Wenchao Zhou, Heming Cui, Xiang Peng, Feifei Li
{"title":"A survey on hybrid transactional and analytical processing","authors":"Haoze Song, Wenchao Zhou, Heming Cui, Xiang Peng, Feifei Li","doi":"10.1007/s00778-024-00858-9","DOIUrl":"https://doi.org/10.1007/s00778-024-00858-9","url":null,"abstract":"<p>To provide applications with the ability to analyze fresh data and eliminate the time-consuming ETL workflow, hybrid transactional and analytical (HTAP) systems have been developed to serve online transaction processing and online analytical processing workloads in a single system. In recent years, HTAP systems have attracted considerable interest from both academia and industry. Several new architectures and technologies have been proposed. This paper provides a comprehensive overview of these HTAP systems. We review recently published papers and technical reports in this field and broadly classify existing HTAP systems into two categories based on their data formats: monolithic and hybrid HTAP. We further classify hybrid HTAP into four sub-categories based on their storage architecture: row-oriented, column-oriented, separated, and hybrid. Based on such a taxonomy, we outline each stream’s design challenges and performance issues (e.g., the contradictory format demand for monolithic HTAP). We then discuss potential solutions and their trade-offs by reviewing noteworthy research findings. Finally, we summarize emerging HTAP applications, benchmarks, future trends, and open problems.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"69 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability evaluation of individual predictions: a data-centric approach 单项预测的可靠性评估:以数据为中心的方法
The VLDB Journal Pub Date : 2024-05-30 DOI: 10.1007/s00778-024-00857-w
Nima Shahbazi, Abolfazl Asudeh
{"title":"Reliability evaluation of individual predictions: a data-centric approach","authors":"Nima Shahbazi, Abolfazl Asudeh","doi":"10.1007/s00778-024-00857-w","DOIUrl":"https://doi.org/10.1007/s00778-024-00857-w","url":null,"abstract":"<p>Machine learning models only provide probabilistic guarantees on the expected loss of random samples from the distribution represented by their training data. As a result, a model with high accuracy, may or may not be reliable for predicting an individual query point. To address this issue, XAI aims to provide explanations of individual predictions, while approaches such as conformal predictions, probabilistic predictions, and prediction intervals count on the model’s certainty in its prediction to identify unreliable cases. Conversely, instead of relying on the model itself, we look for insights in the training data. That is, following the fact a model’s performance is limited to the data it has been trained on, we ask “<i>is a model trained on a given data set, fit for making a specific prediction?</i>”. Specifically, we argue that a model’s prediction is not reliable if (i) there were not enough similar instances in the training set to the query point, and (ii) if there is a high fluctuation (uncertainty) in the vicinity of the query point in the training set. Using these two observations, we propose data-centric reliability measures for individual predictions and develop novel algorithms for efficient and effective computation of the reliability measures during inference time. The proposed algorithms learn the necessary components of the measures from the data itself and are sublinear, which makes them scalable to very large and multi-dimensional settings. Furthermore, an estimator is designed to enable no-data access during the inference time. We conduct extensive experiments using multiple real and synthetic data sets and different tasks, which reflect a consistent correlation between distrust values and model performance.</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient cryptanalysis of an encrypted database supporting data interoperability 支持数据互操作性的加密数据库的高效密码分析
The VLDB Journal Pub Date : 2024-05-23 DOI: 10.1007/s00778-024-00852-1
Gongyu Shi, Geng Wang, Shi-Feng Sun, Dawu Gu
{"title":"Efficient cryptanalysis of an encrypted database supporting data interoperability","authors":"Gongyu Shi, Geng Wang, Shi-Feng Sun, Dawu Gu","doi":"10.1007/s00778-024-00852-1","DOIUrl":"https://doi.org/10.1007/s00778-024-00852-1","url":null,"abstract":"","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"47 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141107488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信