Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献_第9页

Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution 高效动态聚类:从历史聚类演化中捕获模式

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-03-02 DOI: 10.48550/arXiv.2203.00812

Binbin Gu, Saeed Kargar, Faisal Nawab

{"title":"Efficient Dynamic Clustering: Capturing Patterns from Historical Cluster Evolution","authors":"Binbin Gu, Saeed Kargar, Faisal Nawab","doi":"10.48550/arXiv.2203.00812","DOIUrl":"https://doi.org/10.48550/arXiv.2203.00812","url":null,"abstract":"Clustering aims to group unlabeled objects based on similarity inherent among them into clusters. It is important for many tasks such as anomaly detection, database sharding, record linkage, and others. Some clustering methods are taken as batch algorithms that incur a high overhead as they cluster all the objects in the database from scratch or assume an incremental workload. In practice, database objects are updated, added, and removed from databases continuously which makes previous results stale. Running batch algorithms is infeasible in such scenarios as it would incur a significant overhead if performed continuously. This is particularly the case for high-velocity scenarios such as ones in Internet of Things applications. In this paper, we tackle the problem of clustering in high-velocity dynamic scenarios, where the objects are continuously updated, inserted, and deleted. Specifically, we propose a generally dynamic approach to clustering that utilizes previous clustering results. Our system, DynamicC, uses a machine learning model that is augmented with an existing batch algorithm. The DynamicC model trains by observing the clustering decisions made by the batch algorithm. After training, the DynamicC model is usedin cooperation with the batch algorithm to achieve both accurate and fast clustering decisions. The experimental results on four real-world and one synthetic datasets show that our approach has a better performance compared to the state-of-the-art method while achieving similarly accurate clustering results to the baseline batch algorithm.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"75 1","pages":"2:351-2:363"},"PeriodicalIF":0.0,"publicationDate":"2022-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85515179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

3DPro: Querying Complex Three-Dimensional Data with Progressive Compression and Refinement. 3DPro：利用渐进压缩和细化功能查询复杂的三维数据

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-03-01 DOI: 10.48786/edbt.2022.02

Dejun Teng, Yanhui Liang, Furqan Baig, Jun Kong, Vo Hoang, Fusheng Wang

{"title":"3DPro: Querying Complex Three-Dimensional Data with Progressive Compression and Refinement.","authors":"Dejun Teng, Yanhui Liang, Furqan Baig, Jun Kong, Vo Hoang, Fusheng Wang","doi":"10.48786/edbt.2022.02","DOIUrl":"10.48786/edbt.2022.02","url":null,"abstract":"Large-scale three-dimensional spatial data has gained increasing attention with the development of self-driving, mineral exploration, CAD, and human atlases. Such 3D objects are often represented with a polygonal model at high resolution to preserve accuracy. This poses major challenges for 3D data management and spatial queries due to the massive amounts of 3D objects, e.g., trillions of 3D cells, and the high complexity of 3D geometric computation. Traditional spatial querying methods in the Filter-Refine paradigm have a major focus on indexing-based filtering using approximations like minimal bounding boxes and largely neglect the heavy computation in the refinement step at the intra-geometry level, which often dominates the cost of query processing. In this paper, we introduce 3DPro, a system that supports efficient spatial queries for complex 3D objects. 3DPro uses progressive compression of 3D objects preserving multiple levels of details, which significantly reduces the size of the objects and has the data fit into memory. Through a novel Filter-Progressive-Refine paradigm, 3DPro can have query results returned early whenever possible to minimize decompression and geometric computations of 3D objects in higher resolution representations. Our experiments demonstrate that 3DPro out-performs the state-of-the-art 3D data processing techniques by up to an order of magnitude for typical spatial queries.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"25 2","pages":"104-117"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7e/40/nihms-1827080.PMC9540604.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33501263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentially-Private Publication of Origin-Destination Matrices with Intermediate Stops 具有中间停点的起点-终点矩阵的微分私有发布

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-02-24 DOI: 10.48786/edbt.2022.04

Sina Shaham, Gabriel Ghinita, C. Shahabi

{"title":"Differentially-Private Publication of Origin-Destination Matrices with Intermediate Stops","authors":"Sina Shaham, Gabriel Ghinita, C. Shahabi","doi":"10.48786/edbt.2022.04","DOIUrl":"https://doi.org/10.48786/edbt.2022.04","url":null,"abstract":"Conventional origin-destination (OD) matrices record the count of trips between pairs of start and end locations, and have been extensively used in transportation, traffic planning, etc. More recently, due to use case scenarios such as COVID-19 pandemic spread modeling, it is increasingly important to also record intermediate points along an individual's path, rather than only the trip start and end points. This can be achieved by using a multi-dimensional frequency matrix over a data space partitioning at the desired level of granularity. However, serious privacy constraints occur when releasing OD matrix data, and especially when adding multiple intermediate points, which makes individual trajectories more distinguishable to an attacker. To address this threat, we propose a technique for privacy-preserving publication of multi-dimensional OD matrices that achieves differential privacy (DP), the de-facto standard in private data release. We propose a family of approaches that factor in important data properties such as data density and homogeneity in order to build OD matrices that provide provable protection guarantees while preserving query accuracy. Extensive experiments on real and synthetic datasets show that the proposed approaches clearly outperform existing state-of-the-art.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"86 1","pages":"2:131-2:142"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75524222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Model-Independent Design of Knowledge Graphs - Lessons Learnt From Complex Financial Graphs 知识图的模型独立设计——从复杂金融图中吸取的教训

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.46

Luigi Bellomarini, Andrea Gentili, Eleonora Laurenza, Emanuel Sallinger

{"title":"Model-Independent Design of Knowledge Graphs - Lessons Learnt From Complex Financial Graphs","authors":"Luigi Bellomarini, Andrea Gentili, Eleonora Laurenza, Emanuel Sallinger","doi":"10.48786/edbt.2022.46","DOIUrl":"https://doi.org/10.48786/edbt.2022.46","url":null,"abstract":"We propose a model-independent design framework for Knowledge Graphs (KGs), capitalizing on our experience in KGs and model management for the roll out of a very large and complex financial KG for the Central Bank of Italy. KGs have recently garnered increasing attention from industry and are currently exploited in a variety of applications. Many of the common notions of KG share the presence of an extensional component, typically implemented as a graph database storing the enterprise data, and an intensional component, to derive new implicit knowledge in the form of new nodes and new edges. Our framework, KGModel, is based on a meta-level approach, where the data engineer designs the extensional and the intensional components of the KG—the graph schema and the reasoning rules, respectively—at meta-level. Then, in a model-driven fashion, such high-level specification is translated into schema definitions and reasoning rules that can be deployed into the target database systems and state-of-the-art reasoners. Our framework offers a model-independent visual modeling language, a logic-based language for the intensional component, and a set of new complementary software tools for the translation of metalevel specifications for the target systems. We present the details of KGModel, illustrate the software tools we implemented and show the suitability of the framework for real-world scenarios.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:524-2:526"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77815555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Placement of Workloads from Advanced RDBMS Architectures into Complex Cloud Infrastructure 将工作负载从高级RDBMS架构放置到复杂的云基础架构中

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.43

Antony S. Higginson, Clive Bostock, N. Paton, Suzanne M. Embury

{"title":"Placement of Workloads from Advanced RDBMS Architectures into Complex Cloud Infrastructure","authors":"Antony S. Higginson, Clive Bostock, N. Paton, Suzanne M. Embury","doi":"10.48786/edbt.2022.43","DOIUrl":"https://doi.org/10.48786/edbt.2022.43","url":null,"abstract":"Capacity planning is an essential activity in the procurement and daily running of any multi-server computer system. Workload placement is a well known problem and there are several solutions to help address capacity planning problems of knowing where , when and how much resource is needed to place work-loads of varying shapes (resources consumed). Bin-packing algorithms are used extensively in addressing workload placement problems, however, we propose that extensions to existing bin-packing algorithms are required when dealing with workloads from advanced computational architectures such as clustering and consolidation (pluggable), or workloads that exhibit complex data patterns in their signals , such as seasonality, trend and/or shocks (exogenous or otherwise). These extentions are especially needed when consolidating workloads together, for example, consolidation of multiple databases into one ( pluggable databases ) to reduce database server sprawl on estates. In this paper we address bin-packing for singular or clustered environments and propose new algorithms that introduce a time element, giving a richer understanding of the resources requested when workloads are consolidated together, ensuring High Availability (HA) for workloads obtained from advanced database configurations. An experimental evaluation shows that the approach we propose reduces the risk of provisioning wastage in pay-as-you-go cloud architectures.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"15 1","pages":"2:487-2:497"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82313561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learned Query Optimizer: At the Forefront of AI-Driven Databases 学习查询优化器:在人工智能驱动数据库的最前沿

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.56

Rong Zhu, Ziniu Wu, Chengliang Chai, A. Pfadler, Bolin Ding, Guoliang Li, Jingren Zhou

引用次数: 6

Aggregation Detection in CSV Files CSV文件中的聚合检测

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.10

Lan Jiang, Gerardo Vitagliano, Mazhar Hameed, Felix Naumann

{"title":"Aggregation Detection in CSV Files","authors":"Lan Jiang, Gerardo Vitagliano, Mazhar Hameed, Felix Naumann","doi":"10.48786/edbt.2022.10","DOIUrl":"https://doi.org/10.48786/edbt.2022.10","url":null,"abstract":"Aggregations are an arithmetic relationship between a single number and a set of numbers. Tables in raw CSV files often include various types of aggregations to summarize data therein. Identifying aggregations in tables can help understand file structures, detect data errors, and normalize tables. However, recognizing aggregations in CSV files is not trivial, as these files often organize information in an ad-hoc manner with aggregations appearing in arbitrary positions and displaying rounding errors. We propose the three-stage approach AggreCol to recognize aggregations of five types: sum, difference, average, division, and relative change. The first stage detects aggregations of each type individually. The second stage uses a set of pruning rules to remove spurious candidates. The last stage employs rules to allow individual detectors to skip specific parts of the file and retrieve more aggregations. We evaluated our approach with two manually annotated datasets, showing that AggreCol is capable of achieving 0.95 precision and recall for 91.1% and 86.3% of the files, respectively. We obtained similar results on an unseen test dataset, proving the generalizability of our proposed techniques.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:207-2:219"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77850703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed Training of Knowledge Graph Embedding Models using Ray 基于Ray的知识图嵌入模型的分布式训练

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.48

Nasrullah Sheikh, Xiao Qin, B. Reinwald

{"title":"Distributed Training of Knowledge Graph Embedding Models using Ray","authors":"Nasrullah Sheikh, Xiao Qin, B. Reinwald","doi":"10.48786/edbt.2022.48","DOIUrl":"https://doi.org/10.48786/edbt.2022.48","url":null,"abstract":"Knowledge graphs are at the core of numerous consumer and enterprise applications where learned graph embeddings are used to derive insights for the users of these applications. Since knowledge graphs can be very large, the process of learning embeddings is time and resource intensive and needs to be done in a distributed manner to leverage compute resources of multiple machines. Therefore, these applications demand performance and scalability at the development and deployment stages, and require these models to be developed and deployed in frameworks that address these requirements. Ray 1 is an example of such a framework that offers both ease of development and deployment, and enables running tasks in a distributed manner using simple APIs. In this work, we use Ray to build an end-to-end system for data preprocessing and distributed training of graph neural network based knowledge graph embedding models. We apply our system to link prediction task, i.e. using knowledge graph embedding to discover links between nodes in graphs. We evaluate our system on a real-world industrial dataset and demonstrate significant speedups of both, distributed data preprocessing and distributed model training. Compared to non-distributed learning, we achieved a training speedup of 12 × with 4 Ray workers without any deterioration in the evaluation metrics.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"29 1","pages":"2:549-2:553"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81603949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MM-infer: A Tool for Inference of Multi-Model Schemas MM-infer:一个多模型图式推断工具

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.52

P. Koupil, Sebastián Hricko, I. Holubová

引用次数: 6

Integrating the Orca Optimizer into MySQL 将Orca优化器集成到MySQL中

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.45

A. Marathe, S. Lin, Weidong Yu, Kareem El Gebaly, P. Larson, Calvin Sun, Huawei, Calvin Sun

{"title":"Integrating the Orca Optimizer into MySQL","authors":"A. Marathe, S. Lin, Weidong Yu, Kareem El Gebaly, P. Larson, Calvin Sun, Huawei, Calvin Sun","doi":"10.48786/edbt.2022.45","DOIUrl":"https://doi.org/10.48786/edbt.2022.45","url":null,"abstract":"The MySQL query optimizer was designed for relatively simple, OLTP-type queries; for more complex queries its limitations quickly become apparent. Join order optimization, for example, considers only left-deep plans, and selects the join order using a greedy algorithm. Instead of continuing to patch the MySQL optimizer, why not delegate optimization of more complex queries to another more capable optimizer? This paper reports on our experience with integrating the Orca optimizer into MySQL. Orca is an extensible open-source query optimizer—originally used by Pivotal’s Greenplum DBMS—specifically designed for demanding analytical workloads. Queries submitted to MySQL are routed to Orca for optimization, and the resulting plans are returned to MySQL for execution. Metadata and statistical information needed during optimization is retrieved from MySQL’s data dictionary. Experimental results show substantial performance gains. On the TPC-DS benchmark, Orca’s plans were over 10X faster on 10 of the 99 queries, and over 100X faster on 3 queries.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:511-2:523"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82456275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4