Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献_第10页

A Supervised Skyline-Based Algorithm for Spatial Entity Linkage 基于监督天际线的空间实体关联算法

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.11

Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos

{"title":"A Supervised Skyline-Based Algorithm for Spatial Entity Linkage","authors":"Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos","doi":"10.48786/edbt.2022.11","DOIUrl":"https://doi.org/10.48786/edbt.2022.11","url":null,"abstract":"The ease of publishing data on the web has contributed to larger and more diverse types of data. Entities that refer to a physical place and are characterized by a location and different attributes are named spatial entities. Even though the amount of spatial entity data from multiple sources keeps increasing, facilitating the development of richer, more accurate and more comprehensive geospatial applications and services, there is unavoidable redundancy and ambiguity. We address the problem of spatial entity linkage with SkylineExplore-Trained (SkyEx-T ), a skyline-based algorithm that can label an entity pair as being the same physical entity or not. We introduce LinkGeoML-eXtended (LGM-X ), a meta-similarity function that computes similarity features specifically tailored to the specificities of spatial entities. The skylines of SkyEx-T are created using a preference function, which ranks the pairs based on the likelihood of referring to the same entity. We propose deriving the preference function using a tiny training set (down to 0.05% of the dataset). Additionally, we provide a theoretical guarantee for the cut-off that can best separate the classes, and we show experimentally that it results in a nearoptimal F-measure (on average only 2% loss). SkyEx-T yields an F-measure of 0.71-0.74 and beats the existing non-skyline-based baselines with a margin of 0.11-0.39 in F-measure. When compared to machine learning techniques, SkyEx-T is able to achieve a similar accuracy (sometimes slightly better one in very small training sets) and more importantly, having no-parameters to tune and a model that is already explainable (no need for further actions to achieve explainability).","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:220-2:233"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90114839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gamma Probabilistic Databases: Learning from Exchangeable Query-Answers 伽玛概率数据库:从可交换的查询-答案中学习

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.14

Niccolò Meneghetti, Ouael Ben Amara

引用次数: 0

Evaluation of Algorithms for Interaction-Sparse Recommendations: Neural Networks don't Always Win 交互稀疏推荐算法的评估:神经网络并不总是获胜

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.42

Yasamin Klingler, Claude Lehmann, J. Monteiro, Carlo Saladin, A. Bernstein, Kurt Stockinger

{"title":"Evaluation of Algorithms for Interaction-Sparse Recommendations: Neural Networks don't Always Win","authors":"Yasamin Klingler, Claude Lehmann, J. Monteiro, Carlo Saladin, A. Bernstein, Kurt Stockinger","doi":"10.48786/edbt.2022.42","DOIUrl":"https://doi.org/10.48786/edbt.2022.42","url":null,"abstract":"In recent years, top-K recommender systems with implicit feedback data gained interest in many real-world business scenarios. In particular, neural networks have shown promising results on these tasks. However, while traditional recommender systems are built on datasets with frequent user interactions, insurance recommenders often have access to a very limited amount of user interactions, as people only buy a few insurance products. In this paper, we shed new light on the problem of top-K recommendations for interaction-sparse recommender problems. In particular, we analyze six different recommender algorithms, namely a popularity-based baseline and compare it against two matrix factorization methods (SVD++, ALS), one neural network approach (JCA) and two combinations of neural network and factorization machine approaches (DeepFM, NeuFM). We evaluate these algorithms on six different interaction-sparse datasets and one dataset with a less sparse interaction pattern to elucidate the unique behavior of interaction-sparse datasets. In our experimental evaluation based on real-world insurance data, we demonstrate that DeepFM shows the best performance followed by JCA and SVD++, which indicates that neural network approaches are the dominant technologies. However, for the remaining five datasets we observe a different pattern. Overall, the matrix factorization method SVD++ is the winner. Surprisingly, the simple popularity-based approach comes out second followed by the neural network approach JCA. In summary, our experimental evaluation for interaction-sparse datasets demonstrates that in general matrix factorization methods outperform neural network approaches. As a consequence, traditional wellestablished methods should be part of the portfolio of algorithms to solve real-world interaction-sparse recommender problems.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"39 1","pages":"2:475-2:486"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85232386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Conceptual models and databases for searching the genome 用于搜索基因组的概念模型和数据库

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.57

Anna Bernasconi, Pietro Pinoli

{"title":"Conceptual models and databases for searching the genome","authors":"Anna Bernasconi, Pietro Pinoli","doi":"10.48786/edbt.2022.57","DOIUrl":"https://doi.org/10.48786/edbt.2022.57","url":null,"abstract":"Genomics is an extremely complex domain, in terms of concepts, their relations, and their representations in data. This tutorial in-troduces the use of ER models in the context of genomic systems: conceptual models are of great help for simplifying this domain and making it actionable. We carry out a review of successful models presented in the literature for representing biologically-relevant entities and grounding them in databases. We draw a difference between conceptual models that aim to explain the domain and conceptual models that aim to support database design and heterogeneous data integration. Genomic experiments and/or sequences are described by several metadata, specify-ing information on the sampled organism, the used technology, and the organizational process behind the experiment. Instead, we call data the actual regions of the genome that have been read by sequencing technologies and encoded into a machine-readable representation. First, we show how data and metadata can be modeled, then we exploit the proposed models for de-signing search systems, visualizers, and analysis environments. Both domains of human genomics and viral genomics are addressed, surveying several use cases and applications of broader public interest. The tutorial is relevant to the EDBT community because it demonstrates the usefulness of conceptual models’ principles within very current domains; in addition, it offers a concrete example of conceptual models’ use, setting the premises for interdisciplinary collaboration with a greater public (possibly including life science researchers).","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"40 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86479645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Neural Approach to Forming Coherent Teams in Collaboration Networks 协作网络中形成连贯团队的神经方法

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.37

Radin Hamidi Rad, Shirin Seyedsalehi, M. Kargar, Morteza Zihayat, E. Bagheri

引用次数: 4

Implementing Distributed Similarity Joins using Locality Sensitive Hashing 使用位置敏感散列实现分布式相似连接

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.5441/002/edbt.2022.07

Martin Aumüller, Matteo Ceccarello

引用次数: 1

Voyager: Data Discovery and Integration for Onboarding in Data Science 航海家:数据科学入职的数据发现和集成

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.47

Alex Bogatu, N. Paton, Mark Douthwaite, A. Freitas

引用次数: 2

JupySim: Jupyter Notebook Similarity Search System JupySim: Jupyter笔记本相似度搜索系统

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.49

Misato Horiuchi, Yuya Sasaki, Chuan Xiao, Makoto Onizuka

引用次数: 3

Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model 基于高斯混合模型和自回归模型的无监督选择性估计

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.13

Zizhong Meng, Peizhi Wu, Gao Cong, Rong Zhu, Shuai Ma

{"title":"Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model","authors":"Zizhong Meng, Peizhi Wu, Gao Cong, Rong Zhu, Shuai Ma","doi":"10.48786/edbt.2022.13","DOIUrl":"https://doi.org/10.48786/edbt.2022.13","url":null,"abstract":"Selectivity estimation is a fundamental database task, which has been studied for decades. A recent trend is to use deep learning methods for selectivity estimation. Deep autoregressive models have been reported to achieve excellent accuracy. However, if the relation has continuous attributes with large domain sizes, the search space of query inference on deep autoregressive models can be very large, resulting in inaccurate estimation and inefficient inference. To address this challenge, we propose a new model that integrates multiple Gaussian mixture models and a deep autoregressive model. On the one hand, Gaussian mixture models can fit the distribution of continuous attributes and reduce their domain sizes. On the other hand, deep autoregressive model can learn the joint data distribution with reduced domain attributes. In experiments, we compare with multiple baselines on 4 real-world datasets containing continuous attributes, and the experimental results demonstrate that our model can achieve up to 20 times higher accuracy than the second best estimators, while using less space and inference time.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"10 1","pages":"2:247-2:259"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82040718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards A General SIMD Concurrent Approach to Accelerating Integer Compression Algorithms 加速整数压缩算法的通用SIMD并发方法

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI: 10.48786/edbt.2022.32

Juliana Hildebrandt, Dirk Habich, Wolfgang Lehner

{"title":"Towards A General SIMD Concurrent Approach to Accelerating Integer Compression Algorithms","authors":"Juliana Hildebrandt, Dirk Habich, Wolfgang Lehner","doi":"10.48786/edbt.2022.32","DOIUrl":"https://doi.org/10.48786/edbt.2022.32","url":null,"abstract":"Integer compression algorithms play an important role in columnoriented data systems. Previous research has shown that the vectorized implementation of these algorithms based on the Single Instruction Multiple Data (SIMD) parallel paradigm can multiply the compression as well as decompression speeds. While a scalar compression algorithm usually compresses a block of N consecutive integers, the state-of-the-art SIMD implementation scales the block size to k ∗ N with k as the number of elements which could be simultaneously processed in a SIMD register. However, this means that as the SIMD register size increases, the block of integer values for compression also grows, which can have a negative effect on the compression ratio. In this paper, we analyze this effect and present an idea for a novel general approach for the SIMD implementation of integer compression algorithms to overcome that effect. Our novel idea is to concurrently compress k different blocks of size N within SIMD registers. To show the applicability of our idea, we present initial evaluation results for a heavily used compression algorithm and show that our approach can lead to more responsible usage of main memory resources.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"12 1","pages":"2:414-2:418"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83962746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1