Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

筛选
英文 中文
A Supervised Skyline-Based Algorithm for Spatial Entity Linkage 基于监督天际线的空间实体关联算法
Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos
{"title":"A Supervised Skyline-Based Algorithm for Spatial Entity Linkage","authors":"Suela Isaj, Vassilis Kaffes, T. Pedersen, G. Giannopoulos","doi":"10.48786/edbt.2022.11","DOIUrl":"https://doi.org/10.48786/edbt.2022.11","url":null,"abstract":"The ease of publishing data on the web has contributed to larger and more diverse types of data. Entities that refer to a physical place and are characterized by a location and different attributes are named spatial entities. Even though the amount of spatial entity data from multiple sources keeps increasing, facilitating the development of richer, more accurate and more comprehensive geospatial applications and services, there is unavoidable redundancy and ambiguity. We address the problem of spatial entity linkage with SkylineExplore-Trained (SkyEx-T ), a skyline-based algorithm that can label an entity pair as being the same physical entity or not. We introduce LinkGeoML-eXtended (LGM-X ), a meta-similarity function that computes similarity features specifically tailored to the specificities of spatial entities. The skylines of SkyEx-T are created using a preference function, which ranks the pairs based on the likelihood of referring to the same entity. We propose deriving the preference function using a tiny training set (down to 0.05% of the dataset). Additionally, we provide a theoretical guarantee for the cut-off that can best separate the classes, and we show experimentally that it results in a nearoptimal F-measure (on average only 2% loss). SkyEx-T yields an F-measure of 0.71-0.74 and beats the existing non-skyline-based baselines with a margin of 0.11-0.39 in F-measure. When compared to machine learning techniques, SkyEx-T is able to achieve a similar accuracy (sometimes slightly better one in very small training sets) and more importantly, having no-parameters to tune and a model that is already explainable (no need for further actions to achieve explainability).","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"2:220-2:233"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90114839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gamma Probabilistic Databases: Learning from Exchangeable Query-Answers 伽玛概率数据库:从可交换的查询-答案中学习
Niccolò Meneghetti, Ouael Ben Amara
{"title":"Gamma Probabilistic Databases: Learning from Exchangeable Query-Answers","authors":"Niccolò Meneghetti, Ouael Ben Amara","doi":"10.48786/edbt.2022.14","DOIUrl":"https://doi.org/10.48786/edbt.2022.14","url":null,"abstract":"In this paper we propose a novel knowledge compilation technique that compiles Bayesian inference procedures, starting from probabilistic programs expressed in terms of probabilistic queryanswers. To do so, we extend the framework of Dirichlet Probabilistic Databases with the ability to process exchangeable observations of query-answers. We show that the resulting framework can encode non-trivial models, like Latent Dirichlet Allocation and the Ising model, and generate high-performance Gibbs samplers for both models.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"42 1","pages":"2:260-2:273"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91523150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conceptual models and databases for searching the genome 用于搜索基因组的概念模型和数据库
Anna Bernasconi, Pietro Pinoli
{"title":"Conceptual models and databases for searching the genome","authors":"Anna Bernasconi, Pietro Pinoli","doi":"10.48786/edbt.2022.57","DOIUrl":"https://doi.org/10.48786/edbt.2022.57","url":null,"abstract":"Genomics is an extremely complex domain, in terms of concepts, their relations, and their representations in data. This tutorial in-troduces the use of ER models in the context of genomic systems: conceptual models are of great help for simplifying this domain and making it actionable. We carry out a review of successful models presented in the literature for representing biologically-relevant entities and grounding them in databases. We draw a difference between conceptual models that aim to explain the domain and conceptual models that aim to support database design and heterogeneous data integration. Genomic experiments and/or sequences are described by several metadata, specify-ing information on the sampled organism, the used technology, and the organizational process behind the experiment. Instead, we call data the actual regions of the genome that have been read by sequencing technologies and encoded into a machine-readable representation. First, we show how data and metadata can be modeled, then we exploit the proposed models for de-signing search systems, visualizers, and analysis environments. Both domains of human genomics and viral genomics are addressed, surveying several use cases and applications of broader public interest. The tutorial is relevant to the EDBT community because it demonstrates the usefulness of conceptual models’ principles within very current domains; in addition, it offers a concrete example of conceptual models’ use, setting the premises for interdisciplinary collaboration with a greater public (possibly including life science researchers).","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"40 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86479645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing Distributed Similarity Joins using Locality Sensitive Hashing 使用位置敏感散列实现分布式相似连接
Martin Aumüller, Matteo Ceccarello
{"title":"Implementing Distributed Similarity Joins using Locality Sensitive Hashing","authors":"Martin Aumüller, Matteo Ceccarello","doi":"10.5441/002/edbt.2022.07","DOIUrl":"https://doi.org/10.5441/002/edbt.2022.07","url":null,"abstract":"","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"108 1","pages":"1:78-1:90"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85542619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Neural Approach to Forming Coherent Teams in Collaboration Networks 协作网络中形成连贯团队的神经方法
Radin Hamidi Rad, Shirin Seyedsalehi, M. Kargar, Morteza Zihayat, E. Bagheri
{"title":"A Neural Approach to Forming Coherent Teams in Collaboration Networks","authors":"Radin Hamidi Rad, Shirin Seyedsalehi, M. Kargar, Morteza Zihayat, E. Bagheri","doi":"10.48786/edbt.2022.37","DOIUrl":"https://doi.org/10.48786/edbt.2022.37","url":null,"abstract":"We study team formation whose goal is to form a team of experts who collectively cover a set of desirable skills. This problem has mainly been addressed either through graph search techniques, which look for subgraphs that satisfy a set of skill requirements, or through neural architectures that learn a mapping from the skill space to the expert space. An exact graph-based solution to this problem is intractable and its heuristic variants are only able to identify sub-optimal solutions. On the other hand, neural architecture-based solutions treat experts individually without concern for team dynamics. In this paper, we address the task of forming coherent teams and propose a neural approach that maximizes the likelihood of successful collaboration among team members while maximizing the coverage of the required skills by the team. Our extensive experiments show that the proposed approach outperforms the state-of-the-art methods in terms of both ranking and quality metrics.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"80 2 1","pages":"2:440-2:444"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90844973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Voyager: Data Discovery and Integration for Onboarding in Data Science 航海家:数据科学入职的数据发现和集成
Alex Bogatu, N. Paton, Mark Douthwaite, A. Freitas
{"title":"Voyager: Data Discovery and Integration for Onboarding in Data Science","authors":"Alex Bogatu, N. Paton, Mark Douthwaite, A. Freitas","doi":"10.48786/edbt.2022.47","DOIUrl":"https://doi.org/10.48786/edbt.2022.47","url":null,"abstract":"","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"3 2","pages":"2:537-2:548"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72634934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evaluation of Algorithms for Interaction-Sparse Recommendations: Neural Networks don't Always Win 交互稀疏推荐算法的评估:神经网络并不总是获胜
Yasamin Klingler, Claude Lehmann, J. Monteiro, Carlo Saladin, A. Bernstein, Kurt Stockinger
{"title":"Evaluation of Algorithms for Interaction-Sparse Recommendations: Neural Networks don't Always Win","authors":"Yasamin Klingler, Claude Lehmann, J. Monteiro, Carlo Saladin, A. Bernstein, Kurt Stockinger","doi":"10.48786/edbt.2022.42","DOIUrl":"https://doi.org/10.48786/edbt.2022.42","url":null,"abstract":"In recent years, top-K recommender systems with implicit feedback data gained interest in many real-world business scenarios. In particular, neural networks have shown promising results on these tasks. However, while traditional recommender systems are built on datasets with frequent user interactions, insurance recommenders often have access to a very limited amount of user interactions, as people only buy a few insurance products. In this paper, we shed new light on the problem of top-K recommendations for interaction-sparse recommender problems. In particular, we analyze six different recommender algorithms, namely a popularity-based baseline and compare it against two matrix factorization methods (SVD++, ALS), one neural network approach (JCA) and two combinations of neural network and factorization machine approaches (DeepFM, NeuFM). We evaluate these algorithms on six different interaction-sparse datasets and one dataset with a less sparse interaction pattern to elucidate the unique behavior of interaction-sparse datasets. In our experimental evaluation based on real-world insurance data, we demonstrate that DeepFM shows the best performance followed by JCA and SVD++, which indicates that neural network approaches are the dominant technologies. However, for the remaining five datasets we observe a different pattern. Overall, the matrix factorization method SVD++ is the winner. Surprisingly, the simple popularity-based approach comes out second followed by the neural network approach JCA. In summary, our experimental evaluation for interaction-sparse datasets demonstrates that in general matrix factorization methods outperform neural network approaches. As a consequence, traditional wellestablished methods should be part of the portfolio of algorithms to solve real-world interaction-sparse recommender problems.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"39 1","pages":"2:475-2:486"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85232386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
JupySim: Jupyter Notebook Similarity Search System JupySim: Jupyter笔记本相似度搜索系统
Misato Horiuchi, Yuya Sasaki, Chuan Xiao, Makoto Onizuka
{"title":"JupySim: Jupyter Notebook Similarity Search System","authors":"Misato Horiuchi, Yuya Sasaki, Chuan Xiao, Makoto Onizuka","doi":"10.48786/edbt.2022.49","DOIUrl":"https://doi.org/10.48786/edbt.2022.49","url":null,"abstract":"Computational notebooks such as Jupyter notebooks are popular for machine learning and data analytic tasks. Numerous computational notebooks are available on the Web and reusable; however, searching for computational notebooks manually is a tedious task and so far there are no tools to search for computational notebooks effectively and efficiently. In this paper, we develop JupySim , which is a system for similarity search on Jupyter notebooks. In JupySim , users specify contents (codes, tabular data, libraries, and formats of outputs) in Jupyter notebooks as a query, and then retrieve top- 𝑘 Jupyter notebooks with the most similar contents to the given query. The characteristic of JupySim is that the queries and Jupyter notebooks are modeled by graphs for capturing the relationships between codes, data, and outputs. JupySim has intuitive user interfaces that the users can specify their targets of Jupyter notebooks easily. Our demonstration scenarios show that JupySim is effective to find Jupyter notebooks shared on Kaggle for data science.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"100 1","pages":"2:554-2:557"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84012605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model 基于高斯混合模型和自回归模型的无监督选择性估计
Zizhong Meng, Peizhi Wu, Gao Cong, Rong Zhu, Shuai Ma
{"title":"Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model","authors":"Zizhong Meng, Peizhi Wu, Gao Cong, Rong Zhu, Shuai Ma","doi":"10.48786/edbt.2022.13","DOIUrl":"https://doi.org/10.48786/edbt.2022.13","url":null,"abstract":"Selectivity estimation is a fundamental database task, which has been studied for decades. A recent trend is to use deep learning methods for selectivity estimation. Deep autoregressive models have been reported to achieve excellent accuracy. However, if the relation has continuous attributes with large domain sizes, the search space of query inference on deep autoregressive models can be very large, resulting in inaccurate estimation and inefficient inference. To address this challenge, we propose a new model that integrates multiple Gaussian mixture models and a deep autoregressive model. On the one hand, Gaussian mixture models can fit the distribution of continuous attributes and reduce their domain sizes. On the other hand, deep autoregressive model can learn the joint data distribution with reduced domain attributes. In experiments, we compare with multiple baselines on 4 real-world datasets containing continuous attributes, and the experimental results demonstrate that our model can achieve up to 20 times higher accuracy than the second best estimators, while using less space and inference time.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"10 1","pages":"2:247-2:259"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82040718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards A General SIMD Concurrent Approach to Accelerating Integer Compression Algorithms 加速整数压缩算法的通用SIMD并发方法
Juliana Hildebrandt, Dirk Habich, Wolfgang Lehner
{"title":"Towards A General SIMD Concurrent Approach to Accelerating Integer Compression Algorithms","authors":"Juliana Hildebrandt, Dirk Habich, Wolfgang Lehner","doi":"10.48786/edbt.2022.32","DOIUrl":"https://doi.org/10.48786/edbt.2022.32","url":null,"abstract":"Integer compression algorithms play an important role in columnoriented data systems. Previous research has shown that the vectorized implementation of these algorithms based on the Single Instruction Multiple Data (SIMD) parallel paradigm can multiply the compression as well as decompression speeds. While a scalar compression algorithm usually compresses a block of N consecutive integers, the state-of-the-art SIMD implementation scales the block size to k ∗ N with k as the number of elements which could be simultaneously processed in a SIMD register. However, this means that as the SIMD register size increases, the block of integer values for compression also grows, which can have a negative effect on the compression ratio. In this paper, we analyze this effect and present an idea for a novel general approach for the SIMD implementation of integer compression algorithms to overcome that effect. Our novel idea is to concurrently compress k different blocks of size N within SIMD registers. To show the applicability of our idea, we present initial evaluation results for a heavily used compression algorithm and show that our approach can lead to more responsible usage of main memory resources.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"12 1","pages":"2:414-2:418"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83962746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信