Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献_第7页

Consent Management in Data Workflows: A Graph Problem 数据工作流中的同意管理:一个图问题

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.61

Dorota Filipczuk, E. Gerding, G. Konstantinidis

引用次数: 0

Multi-Task Processing in Vertex-Centric Graph Systems: Evaluations and Insights 以顶点为中心的图系统中的多任务处理:评价和见解

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.20

Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Y. Yang, Chunbo Li, B. Kao

{"title":"Multi-Task Processing in Vertex-Centric Graph Systems: Evaluations and Insights","authors":"Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Y. Yang, Chunbo Li, B. Kao","doi":"10.48786/edbt.2023.20","DOIUrl":"https://doi.org/10.48786/edbt.2023.20","url":null,"abstract":"Vertex-centric (VC) graph systems are at the core of large-scale distributed graph processing. For such systems, a common usage pattern is the concurrent processing of multiple tasks ( multi-processing for short), which aims to execute a large number of unit tasks in parallel. In this paper, we point out that multi-processing has not been sufficiently studied or evaluated in previous work; hence, we fill this critical gap with three major contributions. First, we examine the tradeoff between two important measures in VC-systems: the number of communication rounds and message congestion . We show that this tradeoff is crucial to system performance; yet, existing approaches fail to achieve an optimal tradeoff, leading to poor performance. Second, based on exten-sive experimental evaluations on mainstream VC systems (e.g., Giraph, Pregel+, GraphD) and benchmark multi-processing tasks (e.g., Batch Personalized PageRanks, Multiple Source Shortest Paths), we present several important insights on the correlation between system performance and configurations, which is valu-able to practitioners in optimizing system performance. Third, based on the insights drawn from our experimental evaluations, we present a cost-based tuning framework that optimizes the performance of a representative VC-system. This demonstrates the usefulness of the insights.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"4 1","pages":"247-259"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79759570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

E2-NVM: A Memory-Aware Write Scheme to Improve Energy Efficiency and Write Endurance of NVMs using Variational Autoencoders E2-NVM:一种使用变分自编码器的内存感知写入方案，以提高nvm的能源效率和写入持久性

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.49

Saeed Kargar, Binbin Gu, S. Jyothi, Faisal Nawab

引用次数: 2

Describing and Assessing Cubes Through Intentional Analytics 通过意向分析描述和评估多维数据集

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.69

Matteo Francia, M. Golfarelli, S. Rizzi

引用次数: 0

Detecting Stale Data in Wikipedia Infoboxes 检测维基百科信息框中的陈旧数据

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.36

Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, D. Kalashnikov, Felix Naumann, D. Srivastava

引用次数: 0

PyFroid: Scaling Data Analysis on a Commodity Workstation PyFroid:在商品工作站上扩展数据分析

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2024.06

Venkatesh Emani, A. Floratou, C. Curino

{"title":"PyFroid: Scaling Data Analysis on a Commodity Workstation","authors":"Venkatesh Emani, A. Floratou, C. Curino","doi":"10.48786/edbt.2024.06","DOIUrl":"https://doi.org/10.48786/edbt.2024.06","url":null,"abstract":"Almost every organization today is promoting data-driven decision making leveraging advances in data science. According to various surveys, data scientists spend up to 80% of their time cleaning and transforming data. Although data management systems have been carefully optimized for such tasks over several decades, they are seldom leveraged by data scientists who prefer to use libraries such as Pandas, sacrificing performance and scalability in favor of familiarity and ease of use. As a result, data scientists are not able to fully leverage the hardware capabilities of commodity workstations and either end up working on a small sample of their data locally or migrate to more heavyweight frameworks in a cluster environment. In this paper, we present PyFroid, a framework that leverages lightweight relational databases to improve the performance and scalability of Pandas, allowing data scientists to operate on much larger datasets on a commodity workstation. PyFroid has zero learning curve as it maintains all the Pandas APIs and is fully compatible with the tools that data scientists use (e.g., Python notebooks). We experimentally demonstrate that, compared to Pandas, PyFroid is able to analyze up to 20X more data on the same machine, provide comparable or better performance for small datasets as well as near-memory data sizes, and consume much less resources.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"61-67"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89326433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KWIQ: Answering k-core Window Queries in Temporal Networks KWIQ:回答时间网络中的k核窗口查询

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.17

Mahdihusain Momin, Raj Kamal, Shantwana Dixit, Sayan Ranu, A. Bagchi

{"title":"KWIQ: Answering k-core Window Queries in Temporal Networks","authors":"Mahdihusain Momin, Raj Kamal, Shantwana Dixit, Sayan Ranu, A. Bagchi","doi":"10.48786/edbt.2023.17","DOIUrl":"https://doi.org/10.48786/edbt.2023.17","url":null,"abstract":"Understanding the evolution of communities and the factors that contribute to their development, stability and disappearance over time is a fundamental problem in the study of temporal networks. The concept of 𝑘 -core is one of the most popular metrics to detect communities. Since the 𝑘 -core of a temporal network changes with time, an important question arises: Are there nodes that always remain within the 𝑘 -core? In this paper, we explore this question by introducing the notion of core-invariant nodes . Given a temporal window ∆ and a parameter K , the core-invariant nodes are those that are part of the K -core throughout ∆. Core-invariant nodes have been shown to dictate the stability of networks, while being also useful in detecting anomalous behavior. The complexity of finding core-invariant nodes is 𝑂 ( | ∆ |×| 𝐸 | ), which is exorbitantly high for million-scale networks. We overcome this computational bottleneck by designing an algorithm called Kwiq. Kwiq efficiently processes the cascading impact of network updates through a novel data structure called orientation graph. Through extensive experiments on real temporal networks containing millions of nodes, we establish that the proposed pruning strategies are more than 5 times faster than baseline strategies.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"55 1","pages":"208-220"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73858039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Experimental Analysis of Quantile Sketches over Data Streams 数据流上分位数草图的实验分析

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.34

Lasantha Fernando, Harsh Bindra, Khuzaima S. Daudjee

{"title":"An Experimental Analysis of Quantile Sketches over Data Streams","authors":"Lasantha Fernando, Harsh Bindra, Khuzaima S. Daudjee","doi":"10.48786/edbt.2023.34","DOIUrl":"https://doi.org/10.48786/edbt.2023.34","url":null,"abstract":"Streaming systems process large data sets in a single pass while applying operations on the data. Quantiles are one such operation used in streaming systems. Quantiles can outline the behaviour and the cumulative distribution of a data set. We study five recent quantile sketching algorithms designed for streaming settings: KLL Sketch, Moments Sketch, DDSketch, UDDSketch, and ReqSketch. Key aspects of the sketching algorithms in terms of speed, accuracy, and mergeability are examined. The accuracy of these algorithms is evaluated in Apache Flink, a popular open source streaming system, while the speed and mergeability is evaluated in a separate Java implementation. Results show that UDDSketch has the best relative-error accuracy guarantees, while DDSketch and ReqSketch also achieve consistently high accuracy, particularly with long-tailed data distributions. DDSketch has the fastest query and insertion times, while Moments Sketch has the fastest merge times. Our evaluations show that there is no single algorithm that dominates overall performance and different algorithms excel under the different accuracy and run-time performance criteria considered in our study.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"56 1","pages":"424-436"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78966876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VOYAGER: Automatic Computation of Visual Complexity and Aesthetics of Graph Query Interfaces VOYAGER:图形查询接口的视觉复杂性和美学的自动计算

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.72

Duy Pham, S. Bhowmick

引用次数: 1

An Intrinsically Interpretable Entity Matching System 一个内在可解释的实体匹配系统

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI: 10.48786/edbt.2023.54

Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini

{"title":"An Intrinsically Interpretable Entity Matching System","authors":"Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini","doi":"10.48786/edbt.2023.54","DOIUrl":"https://doi.org/10.48786/edbt.2023.54","url":null,"abstract":"Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"31 1","pages":"645-657"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87061940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0