Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

筛选
英文 中文
Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution 高效动态聚类:从历史聚类演化中捕获模式
Binbin Gu, Saeed Kargar, Faisal Nawab
{"title":"Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution","authors":"Binbin Gu, Saeed Kargar, Faisal Nawab","doi":"10.48550/arXiv.2203.00812","DOIUrl":"https://doi.org/10.48550/arXiv.2203.00812","url":null,"abstract":"Clustering aims to group unlabeled objects based on similarity inherent among them into clusters. It is important for many tasks such as anomaly detection, database sharding, record linkage, and others. Some clustering methods are taken as batch algorithms that incur a high overhead as they cluster all the objects in the database from scratch or assume an incremental workload. In practice, database objects are updated, added, and removed from databases continuously which makes previous results stale. Running batch algorithms is infeasible in such scenarios as it would incur a significant overhead if performed continuously. This is particularly the case for high-velocity scenarios such as ones in Internet of Things applications. In this paper, we tackle the problem of clustering in high-velocity dynamic scenarios, where the objects are continuously updated, inserted, and deleted. Specifically, we propose a generally dynamic approach to clustering that utilizes previous clustering results. Our system, DynamicC, uses a machine learning model that is augmented with an existing batch algorithm. The DynamicC model trains by observing the clustering decisions made by the batch algorithm. After training, the DynamicC model is usedin cooperation with the batch algorithm to achieve both accurate and fast clustering decisions. The experimental results on four real-world and one synthetic datasets show that our approach has a better performance compared to the state-of-the-art method while achieving similarly accurate clustering results to the baseline batch algorithm.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"75 1","pages":"2:351-2:363"},"PeriodicalIF":0.0,"publicationDate":"2022-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85515179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
3DPro: Querying Complex Three-Dimensional Data with Progressive Compression and Refinement. 3DPro:利用渐进压缩和细化功能查询复杂的三维数据
Dejun Teng, Yanhui Liang, Furqan Baig, Jun Kong, Vo Hoang, Fusheng Wang
{"title":"3DPro: Querying Complex Three-Dimensional Data with Progressive Compression and Refinement.","authors":"Dejun Teng, Yanhui Liang, Furqan Baig, Jun Kong, Vo Hoang, Fusheng Wang","doi":"10.48786/edbt.2022.02","DOIUrl":"10.48786/edbt.2022.02","url":null,"abstract":"<p><p>Large-scale three-dimensional spatial data has gained increasing attention with the development of self-driving, mineral exploration, CAD, and human atlases. Such 3D objects are often represented with a polygonal model at high resolution to preserve accuracy. This poses major challenges for 3D data management and spatial queries due to the massive amounts of 3D objects, e.g., trillions of 3D cells, and the high complexity of 3D geometric computation. Traditional spatial querying methods in the Filter-Refine paradigm have a major focus on indexing-based filtering using approximations like minimal bounding boxes and largely neglect the heavy computation in the refinement step at the intra-geometry level, which often dominates the cost of query processing. In this paper, we introduce <i>3DPro</i>, a system that supports efficient spatial queries for complex 3D objects. 3DPro uses progressive compression of 3D objects preserving multiple levels of details, which significantly reduces the size of the objects and has the data fit into memory. Through a novel Filter-Progressive-Refine paradigm, 3DPro can have query results returned early whenever possible to minimize decompression and geometric computations of 3D objects in higher resolution representations. Our experiments demonstrate that 3DPro out-performs the state-of-the-art 3D data processing techniques by up to an order of magnitude for typical spatial queries.</p>","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"25 2","pages":"104-117"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7e/40/nihms-1827080.PMC9540604.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33501263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially-Private Publication of Origin-Destination Matrices with Intermediate Stops 具有中间停点的起点-终点矩阵的微分私有发布
Sina Shaham, Gabriel Ghinita, C. Shahabi
{"title":"Differentially-Private Publication of Origin-Destination Matrices with Intermediate Stops","authors":"Sina Shaham, Gabriel Ghinita, C. Shahabi","doi":"10.48786/edbt.2022.04","DOIUrl":"https://doi.org/10.48786/edbt.2022.04","url":null,"abstract":"Conventional origin-destination (OD) matrices record the count of trips between pairs of start and end locations, and have been extensively used in transportation, traffic planning, etc. More recently, due to use case scenarios such as COVID-19 pandemic spread modeling, it is increasingly important to also record intermediate points along an individual's path, rather than only the trip start and end points. This can be achieved by using a multi-dimensional frequency matrix over a data space partitioning at the desired level of granularity. However, serious privacy constraints occur when releasing OD matrix data, and especially when adding multiple intermediate points, which makes individual trajectories more distinguishable to an attacker. To address this threat, we propose a technique for privacy-preserving publication of multi-dimensional OD matrices that achieves differential privacy (DP), the de-facto standard in private data release. We propose a family of approaches that factor in important data properties such as data density and homogeneity in order to build OD matrices that provide provable protection guarantees while preserving query accuracy. Extensive experiments on real and synthetic datasets show that the proposed approaches clearly outperform existing state-of-the-art.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"86 1","pages":"2:131-2:142"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75524222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Model-Independent Design of Knowledge Graphs - Lessons Learnt From Complex Financial Graphs 知识图的模型独立设计——从复杂金融图中吸取的教训
Luigi Bellomarini, Andrea Gentili, Eleonora Laurenza, Emanuel Sallinger
{"title":"Model-Independent Design of Knowledge Graphs - Lessons Learnt From Complex Financial Graphs","authors":"Luigi Bellomarini, Andrea Gentili, Eleonora Laurenza, Emanuel Sallinger","doi":"10.48786/edbt.2022.46","DOIUrl":"https://doi.org/10.48786/edbt.2022.46","url":null,"abstract":"We propose a model-independent design framework for Knowledge Graphs (KGs), capitalizing on our experience in KGs and model management for the roll out of a very large and complex financial KG for the Central Bank of Italy. KGs have recently garnered increasing attention from industry and are currently exploited in a variety of applications. Many of the common notions of KG share the presence of an extensional component, typically implemented as a graph database storing the enterprise data, and an intensional component, to derive new implicit knowledge in the form of new nodes and new edges. Our framework, KGModel, is based on a meta-level approach, where the data engineer designs the extensional and the intensional components of the KG—the graph schema and the reasoning rules, respectively—at meta-level. Then, in a model-driven fashion, such high-level specification is translated into schema definitions and reasoning rules that can be deployed into the target database systems and state-of-the-art reasoners. Our framework offers a model-independent visual modeling language, a logic-based language for the intensional component, and a set of new complementary software tools for the translation of metalevel specifications for the target systems. We present the details of KGModel, illustrate the software tools we implemented and show the suitability of the framework for real-world scenarios.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:524-2:526"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77815555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Placement of Workloads from Advanced RDBMS Architectures into Complex Cloud Infrastructure 将工作负载从高级RDBMS架构放置到复杂的云基础架构中
Antony S. Higginson, Clive Bostock, N. Paton, Suzanne M. Embury
{"title":"Placement of Workloads from Advanced RDBMS Architectures into Complex Cloud Infrastructure","authors":"Antony S. Higginson, Clive Bostock, N. Paton, Suzanne M. Embury","doi":"10.48786/edbt.2022.43","DOIUrl":"https://doi.org/10.48786/edbt.2022.43","url":null,"abstract":"Capacity planning is an essential activity in the procurement and daily running of any multi-server computer system. Workload placement is a well known problem and there are several solutions to help address capacity planning problems of knowing where , when and how much resource is needed to place work-loads of varying shapes (resources consumed). Bin-packing algorithms are used extensively in addressing workload placement problems, however, we propose that extensions to existing bin-packing algorithms are required when dealing with workloads from advanced computational architectures such as clustering and consolidation (pluggable), or workloads that exhibit complex data patterns in their signals , such as seasonality, trend and/or shocks (exogenous or otherwise). These extentions are especially needed when consolidating workloads together, for example, consolidation of multiple databases into one ( pluggable databases ) to reduce database server sprawl on estates. In this paper we address bin-packing for singular or clustered environments and propose new algorithms that introduce a time element, giving a richer understanding of the resources requested when workloads are consolidated together, ensuring High Availability (HA) for workloads obtained from advanced database configurations. An experimental evaluation shows that the approach we propose reduces the risk of provisioning wastage in pay-as-you-go cloud architectures.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"15 1","pages":"2:487-2:497"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82313561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learned Query Optimizer: At the Forefront of AI-Driven Databases 学习查询优化器:在人工智能驱动数据库的最前沿
Rong Zhu, Ziniu Wu, Chengliang Chai, A. Pfadler, Bolin Ding, Guoliang Li, Jingren Zhou
{"title":"Learned Query Optimizer: At the Forefront of AI-Driven Databases","authors":"Rong Zhu, Ziniu Wu, Chengliang Chai, A. Pfadler, Bolin Ding, Guoliang Li, Jingren Zhou","doi":"10.48786/edbt.2022.56","DOIUrl":"https://doi.org/10.48786/edbt.2022.56","url":null,"abstract":"Applying ML-based techniques to optimize traditional databases, or AI4DB, has becoming a hot research spot in recent. Learned techniques for query optimizer(QO) is the forefront in AI4DB. QO provides the most suitable experimental plots for utilizing ML techniques and learned QO has exhibited superiority with enough evidence. In this tutorial, we aim at providing a wide and deep review and analysis on learned QO, ranging from algorithm design, real-world applications and system deployment. For algorithm, we would introduce the advances for learning each individual component in QO, as well as the whole QO module. For system, we would analyze the challenges, as well as some attempts, for deploying ML-based QO into actual DBMS. Based on them, we summarize some design principles and point out several future directions. We hope this tutorial could inspire and guide researchers and engineers working on learned QO, as well as other context in AI4DB.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"195 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80736466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Aggregation Detection in CSV Files CSV文件中的聚合检测
Lan Jiang, Gerardo Vitagliano, Mazhar Hameed, Felix Naumann
{"title":"Aggregation Detection in CSV Files","authors":"Lan Jiang, Gerardo Vitagliano, Mazhar Hameed, Felix Naumann","doi":"10.48786/edbt.2022.10","DOIUrl":"https://doi.org/10.48786/edbt.2022.10","url":null,"abstract":"Aggregations are an arithmetic relationship between a single number and a set of numbers. Tables in raw CSV files often include various types of aggregations to summarize data therein. Identifying aggregations in tables can help understand file structures, detect data errors, and normalize tables. However, recognizing aggregations in CSV files is not trivial, as these files often organize information in an ad-hoc manner with aggregations appearing in arbitrary positions and displaying rounding errors. We propose the three-stage approach AggreCol to recognize aggregations of five types: sum, difference, average, division, and relative change. The first stage detects aggregations of each type individually. The second stage uses a set of pruning rules to remove spurious candidates. The last stage employs rules to allow individual detectors to skip specific parts of the file and retrieve more aggregations. We evaluated our approach with two manually annotated datasets, showing that AggreCol is capable of achieving 0.95 precision and recall for 91.1% and 86.3% of the files, respectively. We obtained similar results on an unseen test dataset, proving the generalizability of our proposed techniques.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:207-2:219"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77850703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Training of Knowledge Graph Embedding Models using Ray 基于Ray的知识图嵌入模型的分布式训练
Nasrullah Sheikh, Xiao Qin, B. Reinwald
{"title":"Distributed Training of Knowledge Graph Embedding Models using Ray","authors":"Nasrullah Sheikh, Xiao Qin, B. Reinwald","doi":"10.48786/edbt.2022.48","DOIUrl":"https://doi.org/10.48786/edbt.2022.48","url":null,"abstract":"Knowledge graphs are at the core of numerous consumer and enterprise applications where learned graph embeddings are used to derive insights for the users of these applications. Since knowledge graphs can be very large, the process of learning embeddings is time and resource intensive and needs to be done in a distributed manner to leverage compute resources of multiple machines. Therefore, these applications demand performance and scalability at the development and deployment stages, and require these models to be developed and deployed in frameworks that address these requirements. Ray 1 is an example of such a framework that offers both ease of development and deployment, and enables running tasks in a distributed manner using simple APIs. In this work, we use Ray to build an end-to-end system for data preprocessing and distributed training of graph neural network based knowledge graph embedding models. We apply our system to link prediction task, i.e. using knowledge graph embedding to discover links between nodes in graphs. We evaluate our system on a real-world industrial dataset and demonstrate significant speedups of both, distributed data preprocessing and distributed model training. Compared to non-distributed learning, we achieved a training speedup of 12 × with 4 Ray workers without any deterioration in the evaluation metrics.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"29 1","pages":"2:549-2:553"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81603949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MM-infer: A Tool for Inference of Multi-Model Schemas MM-infer:一个多模型图式推断工具
P. Koupil, Sebastián Hricko, I. Holubová
{"title":"MM-infer: A Tool for Inference of Multi-Model Schemas","authors":"P. Koupil, Sebastián Hricko, I. Holubová","doi":"10.48786/edbt.2022.52","DOIUrl":"https://doi.org/10.48786/edbt.2022.52","url":null,"abstract":"The variety feature of Big Data, represented by multi-model data, has brought a new dimension of complexity to data management. The need to process a set of distinct but interlinked models is a challenging task. In our demonstration, we present our prototype implementation MM-infer that ensures inference of a common schema of multi-model data. It supports popular data models and all three types of their mutual combinations, i.e., inter-model references, the embedding of models, and cross-model redundancy. Following the current trends, the implementation can efficiently process large amounts of data. To the best of our knowledge, ours is the first tool addressing schema inference in the world of multi-model databases.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"31 1","pages":"2:566-2:569"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82357077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Integrating the Orca Optimizer into MySQL 将Orca优化器集成到MySQL中
A. Marathe, S. Lin, Weidong Yu, Kareem El Gebaly, P. Larson, Calvin Sun, Huawei, Calvin Sun
{"title":"Integrating the Orca Optimizer into MySQL","authors":"A. Marathe, S. Lin, Weidong Yu, Kareem El Gebaly, P. Larson, Calvin Sun, Huawei, Calvin Sun","doi":"10.48786/edbt.2022.45","DOIUrl":"https://doi.org/10.48786/edbt.2022.45","url":null,"abstract":"The MySQL query optimizer was designed for relatively simple, OLTP-type queries; for more complex queries its limitations quickly become apparent. Join order optimization, for example, considers only left-deep plans, and selects the join order using a greedy algorithm. Instead of continuing to patch the MySQL optimizer, why not delegate optimization of more complex queries to another more capable optimizer? This paper reports on our experience with integrating the Orca optimizer into MySQL. Orca is an extensible open-source query optimizer—originally used by Pivotal’s Greenplum DBMS—specifically designed for demanding analytical workloads. Queries submitted to MySQL are routed to Orca for optimization, and the resulting plans are returned to MySQL for execution. Metadata and statistical information needed during optimization is retrieved from MySQL’s data dictionary. Experimental results show substantial performance gains. On the TPC-DS benchmark, Orca’s plans were over 10X faster on 10 of the 99 queries, and over 100X faster on 3 queries.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:511-2:523"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82456275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信