Searching for Truth in a Database of Statistics

Tien-Duc Cao, I. Manolescu, Xavier Tannier
{"title":"Searching for Truth in a Database of Statistics","authors":"Tien-Duc Cao, I. Manolescu, Xavier Tannier","doi":"10.1145/3201463.3201467","DOIUrl":null,"url":null,"abstract":"The proliferation of falsehood and misinformation, in particular through the Web, has lead to increasing energy being invested into journalistic fact-checking. Fact-checking journalists typically check the accuracy of a claim against some trusted data source. Statistic databases such as those compiled by state agencies are often used as trusted data sources, as they contain valuable, high-quality information. However, their usability is limited when they are shared in a format such as HTML or spreadsheets: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. We present a novel algorithm enabling the exploitation of such statistic tables, by (i) identifying the statistic datasets most relevant for a given fact-checking query, and (ii) extracting from each dataset the best specific (precise) query answer it may contain. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.","PeriodicalId":365496,"journal":{"name":"Proceedings of the 21st International Workshop on the Web and Databases","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Workshop on the Web and Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3201463.3201467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

The proliferation of falsehood and misinformation, in particular through the Web, has lead to increasing energy being invested into journalistic fact-checking. Fact-checking journalists typically check the accuracy of a claim against some trusted data source. Statistic databases such as those compiled by state agencies are often used as trusted data sources, as they contain valuable, high-quality information. However, their usability is limited when they are shared in a format such as HTML or spreadsheets: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. We present a novel algorithm enabling the exploitation of such statistic tables, by (i) identifying the statistic datasets most relevant for a given fact-checking query, and (ii) extracting from each dataset the best specific (precise) query answer it may contain. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.
在统计数据库中寻找真相
虚假和错误信息的扩散,特别是通过网络传播的,导致越来越多的精力投入到新闻事实核查中。事实核查记者通常会根据一些可信的数据来源来核查声明的准确性。统计数据库(如由国家机构编制的数据库)通常被用作可信的数据源,因为它们包含有价值的高质量信息。然而,当它们以HTML或电子表格等格式共享时,它们的可用性受到限制:这使得很难找到最相关的数据集来检查特定的索赔,或者从数据集中快速提取给定查询的最佳答案。我们提出了一种新的算法,通过(i)识别与给定事实检查查询最相关的统计数据集,以及(ii)从每个数据集中提取它可能包含的最佳特定(精确)查询答案,从而能够利用这些统计表。我们已经实施了我们的方法,并在从法国国家统计研究所INSEE获得的完整统计语料库上进行了实验。实验和比较表明了所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信