Proceedings of the 21st International Workshop on the Web and Databases最新文献

筛选
英文 中文
Cleaning Data with Constraints and Experts 使用约束和专家清理数据
Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201464
A. Assadi, T. Milo, Slava Novgorodov
{"title":"Cleaning Data with Constraints and Experts","authors":"A. Assadi, T. Milo, Slava Novgorodov","doi":"10.1145/3201463.3201464","DOIUrl":"https://doi.org/10.1145/3201463.3201464","url":null,"abstract":"Popular techniques for data cleaning use integrity constraints to identify errors in the data and to automatically resolve them, e.g. by using predefined priorities among possible updates and finding a minimal repair that will resolve violations. Such automatic solutions however cannot ensure precision of the repairs since they do not have enough evidence about the actual errors and may in fact lead to wrong results with respect to the ground truth. It has thus been suggested to use domain experts to examine the potential updates and choose which should be applied to the database. However, the sheer volume of the databases and the large number of possible updates that may resolve a given constraint violation, may make such a manual examination prohibitory expensive. The goal of the DANCE system presented here is to help to optimize the experts work and reduce as much as possible the number of questions (updates verification) they need to address. Given a constraint violation, our algorithm identifies the suspicious tuples whose update may contribute (directly or indirectly) to the constraint resolution, as well as the possible dependencies among them. Using this information it builds a graph whose nodes are the suspicious tuples and whose weighted edges capture the likelihood of an error in one tuple to occur and affect the other. PageRank-style algorithm then allows us to identify the most beneficial tuples to ask about first. Incremental graph maintenance is used to assure interactive response time. We implemented our solution in the DANCE system and show its effectiveness and efficiency through a comprehensive suite of experiments.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115636464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Searching for Truth in a Database of Statistics 在统计数据库中寻找真相
Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201467
Tien-Duc Cao, I. Manolescu, Xavier Tannier
{"title":"Searching for Truth in a Database of Statistics","authors":"Tien-Duc Cao, I. Manolescu, Xavier Tannier","doi":"10.1145/3201463.3201467","DOIUrl":"https://doi.org/10.1145/3201463.3201467","url":null,"abstract":"The proliferation of falsehood and misinformation, in particular through the Web, has lead to increasing energy being invested into journalistic fact-checking. Fact-checking journalists typically check the accuracy of a claim against some trusted data source. Statistic databases such as those compiled by state agencies are often used as trusted data sources, as they contain valuable, high-quality information. However, their usability is limited when they are shared in a format such as HTML or spreadsheets: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. We present a novel algorithm enabling the exploitation of such statistic tables, by (i) identifying the statistic datasets most relevant for a given fact-checking query, and (ii) extracting from each dataset the best specific (precise) query answer it may contain. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128233496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Proceedings of the 21st International Workshop on the Web and Databases 第21届网络与数据库国际研讨会论文集
{"title":"Proceedings of the 21st International Workshop on the Web and Databases","authors":"","doi":"10.1145/3201463","DOIUrl":"https://doi.org/10.1145/3201463","url":null,"abstract":"","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131222262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Wikipedia Table Schemas for Knowledge Graph Augmentation 利用Wikipedia表模式增强知识图谱
Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201468
Matteo Cannaviccio, Lorenzo Ariemma, Denilson Barbosa, P. Merialdo
{"title":"Leveraging Wikipedia Table Schemas for Knowledge Graph Augmentation","authors":"Matteo Cannaviccio, Lorenzo Ariemma, Denilson Barbosa, P. Merialdo","doi":"10.1145/3201463.3201468","DOIUrl":"https://doi.org/10.1145/3201463.3201468","url":null,"abstract":"General solutions to augment Knowledge Graphs (KGs) with facts extracted from Web tables aim to associate pairs of columns from the table with a KG relation based on the matches between pairs of entities in the table and facts in the KG. These approaches suffer from intrinsic limitations due to the incompleteness of the KGs. In this paper we investigate an alternative solution, which leverages the patterns that occur on the schemas of a large corpus of Wikipedia tables. Our experimental evaluation, which used DBpedia as reference KG, demonstrates the advantages of our approach over state-of-the-art solutions and reveals that we can extract more than 1.7M of facts with an estimated accuracy of 0.81 even from tables that do not expose any fact on the KG.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129109263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
DataVizard
Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201465
Rema Ananthanarayanan, P. Lohia, Srikanta J. Bedathur
{"title":"DataVizard","authors":"Rema Ananthanarayanan, P. Lohia, Srikanta J. Bedathur","doi":"10.1145/3201463.3201465","DOIUrl":"https://doi.org/10.1145/3201463.3201465","url":null,"abstract":"Selecting the appropriate visual presentation of the data such that it not only preserves the semantics but also provides an intuitive summary of the data is an important, often the final step of data analytics. Unfortunately, this is also a step involving significant human effort starting from selection of groups of columns in the structured results from analytics stages, to the selection of right visualization by experimenting with various alternatives. In this paper, we describe our DataVizard system aimed at reducing this overhead by automatically recommending the most appropriate visual presentation for the structured result. Specifically, we consider the following two scenarios: first, when one needs to visualize the results of a structured query such as SQL; and the second, when one has acquired a data table with an associated short description (e.g., tables from the Web). Using a corpus of real-world database queries (and their results) and a number of statistical tables crawled from the Web, we show that DataVizard is capable of recommending visual presentations with high accuracy.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129505425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Processing Class-Constraint K-NN Queries with MISP 用MISP处理类约束K-NN查询
Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201466
Evica Milchevski, Fabian Neffgen, S. Michel
{"title":"Processing Class-Constraint K-NN Queries with MISP","authors":"Evica Milchevski, Fabian Neffgen, S. Michel","doi":"10.1145/3201463.3201466","DOIUrl":"https://doi.org/10.1145/3201463.3201466","url":null,"abstract":"In this work, we consider processing k-nearest-neighbor (k-NN) queries, with the additional requirement that the result objects are of a specific type. To solve this problem, we propose an approach based on a combination of an inverted index and state-of-the-art similarity search index structure for efficiently pruning the search space early-on. Furthermore, we provide a cost model, and an extensive experimental study, that analyzes the performance of the proposed index structure under different configurations, with the aim of finding the most efficient one for the dataset being searched.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127009180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信