Proceedings of the 21st International Workshop on the Web and Databases最新文献

Cleaning Data with Constraints and Experts 使用约束和专家清理数据

Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201464

A. Assadi, T. Milo, Slava Novgorodov

{"title":"Cleaning Data with Constraints and Experts","authors":"A. Assadi, T. Milo, Slava Novgorodov","doi":"10.1145/3201463.3201464","DOIUrl":"https://doi.org/10.1145/3201463.3201464","url":null,"abstract":"Popular techniques for data cleaning use integrity constraints to identify errors in the data and to automatically resolve them, e.g. by using predefined priorities among possible updates and finding a minimal repair that will resolve violations. Such automatic solutions however cannot ensure precision of the repairs since they do not have enough evidence about the actual errors and may in fact lead to wrong results with respect to the ground truth. It has thus been suggested to use domain experts to examine the potential updates and choose which should be applied to the database. However, the sheer volume of the databases and the large number of possible updates that may resolve a given constraint violation, may make such a manual examination prohibitory expensive. The goal of the DANCE system presented here is to help to optimize the experts work and reduce as much as possible the number of questions (updates verification) they need to address. Given a constraint violation, our algorithm identifies the suspicious tuples whose update may contribute (directly or indirectly) to the constraint resolution, as well as the possible dependencies among them. Using this information it builds a graph whose nodes are the suspicious tuples and whose weighted edges capture the likelihood of an error in one tuple to occur and affect the other. PageRank-style algorithm then allows us to identify the most beneficial tuples to ask about first. Incremental graph maintenance is used to assure interactive response time. We implemented our solution in the DANCE system and show its effectiveness and efficiency through a comprehensive suite of experiments.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115636464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Searching for Truth in a Database of Statistics 在统计数据库中寻找真相

Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201467

Tien-Duc Cao, I. Manolescu, Xavier Tannier

{"title":"Searching for Truth in a Database of Statistics","authors":"Tien-Duc Cao, I. Manolescu, Xavier Tannier","doi":"10.1145/3201463.3201467","DOIUrl":"https://doi.org/10.1145/3201463.3201467","url":null,"abstract":"The proliferation of falsehood and misinformation, in particular through the Web, has lead to increasing energy being invested into journalistic fact-checking. Fact-checking journalists typically check the accuracy of a claim against some trusted data source. Statistic databases such as those compiled by state agencies are often used as trusted data sources, as they contain valuable, high-quality information. However, their usability is limited when they are shared in a format such as HTML or spreadsheets: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. We present a novel algorithm enabling the exploitation of such statistic tables, by (i) identifying the statistic datasets most relevant for a given fact-checking query, and (ii) extracting from each dataset the best specific (precise) query answer it may contain. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128233496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Proceedings of the 21st International Workshop on the Web and Databases 第21届网络与数据库国际研讨会论文集

Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463

引用次数: 0

Leveraging Wikipedia Table Schemas for Knowledge Graph Augmentation 利用Wikipedia表模式增强知识图谱

Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201468

Matteo Cannaviccio, Lorenzo Ariemma, Denilson Barbosa, P. Merialdo

引用次数: 12

DataVizard

Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201465

Rema Ananthanarayanan, P. Lohia, Srikanta J. Bedathur

引用次数: 12

Processing Class-Constraint K-NN Queries with MISP 用MISP处理类约束K-NN查询

Proceedings of the 21st International Workshop on the Web and Databases Pub Date : 2018-06-10 DOI: 10.1145/3201463.3201466

Evica Milchevski, Fabian Neffgen, S. Michel

引用次数: 1