Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

筛选
英文 中文
Consent Management in Data Workflows: A Graph Problem 数据工作流中的同意管理:一个图问题
Dorota Filipczuk, E. Gerding, G. Konstantinidis
{"title":"Consent Management in Data Workflows: A Graph Problem","authors":"Dorota Filipczuk, E. Gerding, G. Konstantinidis","doi":"10.48786/edbt.2023.61","DOIUrl":"https://doi.org/10.48786/edbt.2023.61","url":null,"abstract":"Inmoderndataprocessing systemsusersexpectaserviceprovider to automatically respect their consent in all data processing within the service. However, data may be processed for many different purposes by several layers of algorithms that create complex workflows. To date, there is no existing approach to automatically satisfy fine-grained privacy constraints of a user in a way which optimises the service provider’s gains from processing. In this paper, we model a data processing workflow as a graph. User constraints and processing purposes are pairs of vertices which need to be disconnected in this graph. We propose heuristics and algorithms while at the same time we show that, in general, this problem is NP-hard. We discuss the optimality versus efficiency of our algorithms and evaluate them using synthetically generated data. On the practical side, our algorithms can provide a nearly optimal solution in the face of tens of constraints and graphs of thousands of nodes, in a few seconds.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"35 1","pages":"737-748"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87213118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Processing in Vertex-Centric Graph Systems: Evaluations and Insights 以顶点为中心的图系统中的多任务处理:评价和见解
Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Y. Yang, Chunbo Li, B. Kao
{"title":"Multi-Task Processing in Vertex-Centric Graph Systems: Evaluations and Insights","authors":"Siqiang Luo, Zichen Zhu, Xiaokui Xiao, Y. Yang, Chunbo Li, B. Kao","doi":"10.48786/edbt.2023.20","DOIUrl":"https://doi.org/10.48786/edbt.2023.20","url":null,"abstract":"Vertex-centric (VC) graph systems are at the core of large-scale distributed graph processing. For such systems, a common usage pattern is the concurrent processing of multiple tasks ( multi-processing for short), which aims to execute a large number of unit tasks in parallel. In this paper, we point out that multi-processing has not been sufficiently studied or evaluated in previous work; hence, we fill this critical gap with three major contributions. First, we examine the tradeoff between two important measures in VC-systems: the number of communication rounds and message congestion . We show that this tradeoff is crucial to system performance; yet, existing approaches fail to achieve an optimal tradeoff, leading to poor performance. Second, based on exten-sive experimental evaluations on mainstream VC systems (e.g., Giraph, Pregel+, GraphD) and benchmark multi-processing tasks (e.g., Batch Personalized PageRanks, Multiple Source Shortest Paths), we present several important insights on the correlation between system performance and configurations, which is valu-able to practitioners in optimizing system performance. Third, based on the insights drawn from our experimental evaluations, we present a cost-based tuning framework that optimizes the performance of a representative VC-system. This demonstrates the usefulness of the insights.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"4 1","pages":"247-259"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79759570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
E2-NVM: A Memory-Aware Write Scheme to Improve Energy Efficiency and Write Endurance of NVMs using Variational Autoencoders E2-NVM:一种使用变分自编码器的内存感知写入方案,以提高nvm的能源效率和写入持久性
Saeed Kargar, Binbin Gu, S. Jyothi, Faisal Nawab
{"title":"E2-NVM: A Memory-Aware Write Scheme to Improve Energy Efficiency and Write Endurance of NVMs using Variational Autoencoders","authors":"Saeed Kargar, Binbin Gu, S. Jyothi, Faisal Nawab","doi":"10.48786/edbt.2023.49","DOIUrl":"https://doi.org/10.48786/edbt.2023.49","url":null,"abstract":"We introduce E2-NVM , a software-level memory-aware storage layer to improve the Energy efficiency and write Endurance (E2) of NVMs. E2-NVM employs a Variational Autoencoder (VAE) based design to direct the write operations judiciously to the memory segments that minimize bit flips. E2-NVM can be augmented with existing indexing solutions. E2-NVM can also be combined with prior hardware-based solutions to further improve efficiency. We performed real evaluations on an Optane memory device that show that E2-NVM can achieve up to 56% reduction in energy consumption.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"10 1","pages":"578-590"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82318149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Describing and Assessing Cubes Through Intentional Analytics 通过意向分析描述和评估多维数据集
Matteo Francia, M. Golfarelli, S. Rizzi
{"title":"Describing and Assessing Cubes Through Intentional Analytics","authors":"Matteo Francia, M. Golfarelli, S. Rizzi","doi":"10.48786/edbt.2023.69","DOIUrl":"https://doi.org/10.48786/edbt.2023.69","url":null,"abstract":"The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP and analytics by (i) letting users explore multidimensional cubes stating their intentions, and (ii) returning multidimensional data coupled with knowledge insights in the form of annotations of subsets of data. Goal of this demonstration is to showcase the IAM approach using a notebook where the user can create a data exploration session by writing describe and assess statements, whose results are displayed by combining tabular data and charts so as to bring the highlights discovered to the user’s attention. The demonstration plan will show the effectiveness of the IAM approach in supporting data exploration and analysis and its added value as compared to a traditional OLAP session by proposing two scenarios with guided interaction and letting users run custom sessions.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"2 1","pages":"803-806"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84552661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Stale Data in Wikipedia Infoboxes 检测维基百科信息框中的陈旧数据
Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, D. Kalashnikov, Felix Naumann, D. Srivastava
{"title":"Detecting Stale Data in Wikipedia Infoboxes","authors":"Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, D. Kalashnikov, Felix Naumann, D. Srivastava","doi":"10.48786/edbt.2023.36","DOIUrl":"https://doi.org/10.48786/edbt.2023.36","url":null,"abstract":"Today’s fast-paced society is increasingly reliant on correct and up-to-date data. Wikipedia is the world’s most popular source of knowledge, and its infoboxes contain concise semi-structured data with important facts about a page’s topic. However, these data are not always up-to-date: we do not expect Wikipedia editors to update items at the moment their true values change. Also, many pages might not be well maintained and users might forget to update the data, e.g., when they are on holiday. To detect stale data in Wikipedia infoboxes, we combine cor-relation-based and rule-based approaches trained on different temporal granularities, based on all infobox changes over 15 years of English Wikipedia. We are able to predict 8 . 19% of all changes with a precision of 89 . 69% over a whole year, thus meet-ing our target precision of","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"29 1","pages":"450-456"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82545399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyFroid: Scaling Data Analysis on a Commodity Workstation PyFroid:在商品工作站上扩展数据分析
Venkatesh Emani, A. Floratou, C. Curino
{"title":"PyFroid: Scaling Data Analysis on a Commodity Workstation","authors":"Venkatesh Emani, A. Floratou, C. Curino","doi":"10.48786/edbt.2024.06","DOIUrl":"https://doi.org/10.48786/edbt.2024.06","url":null,"abstract":"Almost every organization today is promoting data-driven decision making leveraging advances in data science. According to various surveys, data scientists spend up to 80% of their time cleaning and transforming data. Although data management systems have been carefully optimized for such tasks over several decades, they are seldom leveraged by data scientists who prefer to use libraries such as Pandas, sacrificing performance and scalability in favor of familiarity and ease of use. As a result, data scientists are not able to fully leverage the hardware capabilities of commodity workstations and either end up working on a small sample of their data locally or migrate to more heavyweight frameworks in a cluster environment. In this paper, we present PyFroid, a framework that leverages lightweight relational databases to improve the performance and scalability of Pandas, allowing data scientists to operate on much larger datasets on a commodity workstation. PyFroid has zero learning curve as it maintains all the Pandas APIs and is fully compatible with the tools that data scientists use (e.g., Python notebooks). We experimentally demonstrate that, compared to Pandas, PyFroid is able to analyze up to 20X more data on the same machine, provide comparable or better performance for small datasets as well as near-memory data sizes, and consume much less resources.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"61-67"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89326433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KWIQ: Answering k-core Window Queries in Temporal Networks KWIQ:回答时间网络中的k核窗口查询
Mahdihusain Momin, Raj Kamal, Shantwana Dixit, Sayan Ranu, A. Bagchi
{"title":"KWIQ: Answering k-core Window Queries in Temporal Networks","authors":"Mahdihusain Momin, Raj Kamal, Shantwana Dixit, Sayan Ranu, A. Bagchi","doi":"10.48786/edbt.2023.17","DOIUrl":"https://doi.org/10.48786/edbt.2023.17","url":null,"abstract":"Understanding the evolution of communities and the factors that contribute to their development, stability and disappearance over time is a fundamental problem in the study of temporal networks. The concept of 𝑘 -core is one of the most popular metrics to detect communities. Since the 𝑘 -core of a temporal network changes with time, an important question arises: Are there nodes that always remain within the 𝑘 -core? In this paper, we explore this question by introducing the notion of core-invariant nodes . Given a temporal window ∆ and a parameter K , the core-invariant nodes are those that are part of the K -core throughout ∆. Core-invariant nodes have been shown to dictate the stability of networks, while being also useful in detecting anomalous behavior. The complexity of finding core-invariant nodes is 𝑂 ( | ∆ |×| 𝐸 | ), which is exorbitantly high for million-scale networks. We overcome this computational bottleneck by designing an algorithm called Kwiq. Kwiq efficiently processes the cascading impact of network updates through a novel data structure called orientation graph. Through extensive experiments on real temporal networks containing millions of nodes, we establish that the proposed pruning strategies are more than 5 times faster than baseline strategies.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"55 1","pages":"208-220"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73858039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Analysis of Quantile Sketches over Data Streams 数据流上分位数草图的实验分析
Lasantha Fernando, Harsh Bindra, Khuzaima S. Daudjee
{"title":"An Experimental Analysis of Quantile Sketches over Data Streams","authors":"Lasantha Fernando, Harsh Bindra, Khuzaima S. Daudjee","doi":"10.48786/edbt.2023.34","DOIUrl":"https://doi.org/10.48786/edbt.2023.34","url":null,"abstract":"Streaming systems process large data sets in a single pass while applying operations on the data. Quantiles are one such operation used in streaming systems. Quantiles can outline the behaviour and the cumulative distribution of a data set. We study five recent quantile sketching algorithms designed for streaming settings: KLL Sketch, Moments Sketch, DDSketch, UDDSketch, and ReqSketch. Key aspects of the sketching algorithms in terms of speed, accuracy, and mergeability are examined. The accuracy of these algorithms is evaluated in Apache Flink, a popular open source streaming system, while the speed and mergeability is evaluated in a separate Java implementation. Results show that UDDSketch has the best relative-error accuracy guarantees, while DDSketch and ReqSketch also achieve consistently high accuracy, particularly with long-tailed data distributions. DDSketch has the fastest query and insertion times, while Moments Sketch has the fastest merge times. Our evaluations show that there is no single algorithm that dominates overall performance and different algorithms excel under the different accuracy and run-time performance criteria considered in our study.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"56 1","pages":"424-436"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78966876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VOYAGER: Automatic Computation of Visual Complexity and Aesthetics of Graph Query Interfaces VOYAGER:图形查询接口的视觉复杂性和美学的自动计算
Duy Pham, S. Bhowmick
{"title":"VOYAGER: Automatic Computation of Visual Complexity and Aesthetics of Graph Query Interfaces","authors":"Duy Pham, S. Bhowmick","doi":"10.48786/edbt.2023.72","DOIUrl":"https://doi.org/10.48786/edbt.2023.72","url":null,"abstract":"People prefer attractive visual query interfaces (vqi). Such interfaces are paramount for enhancing usability of graph querying frameworks. However, scant attention has been paid to the vi- sual complexity and aesthetics of graph query interfaces. In this demonstration, we present a novel system called voyager that leverages on research in computer vision, human-computer interaction (hci) and cognitive psychology to automatically compute the visual complexity and aesthetics of a graph query interface. voyager can not only guide vqi designers to iteratively improve their design to balance usability and aesthetics of visual query interfaces but it can also facilitate quantitative comparison of the visual complexity and aesthetics of a set of visual query interfaces. We demonstrate various innovative features of voyager and its promising results.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"151 1","pages":"815-818"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77798363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Intrinsically Interpretable Entity Matching System 一个内在可解释的实体匹配系统
Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini
{"title":"An Intrinsically Interpretable Entity Matching System","authors":"Andrea Baraldi, Francesco Del Buono, Francesco Guerra, Matteo Paganelli, M. Vincini","doi":"10.48786/edbt.2023.54","DOIUrl":"https://doi.org/10.48786/edbt.2023.54","url":null,"abstract":"Explainable classification systems generate predictions along with a weight for each term in the input record measuring its contribution to the prediction. In the entity matching (EM) scenario, inputs are pairs of entity descriptions and the resulting explanations can be difficult to understand for the users. They can be very long and assign different impacts to similar terms located in different descriptions. To address these issues, we introduce the concept of decision units, i.e., basic information units formed either by pairs of (similar) terms, each one belonging to a different entity description, or unique terms, existing in one of the descriptions only. Decision units form a new feature space, able to represent, in a compact and meaningful way, pairs of entity descriptions. An explainable model trained on such features generates effective explanations customized for EM datasets. In this paper, we propose this idea via a three-component architecture template, which consists of a decision unit generator, a decision unit scorer, and an explainable matcher. Then, we introduce WYM (Why do You Match?), an implementation of the architecture oriented to textual EM databases. The experiments show that our approach has accuracy comparable to other state-of-the-art Deep Learning based EM models, but, differently from them, its predictions are highly interpretable.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"31 1","pages":"645-657"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87061940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信