The Choice of Textual Knowledge Base in Automated Claim Checking

IF 2.9 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Journal of Data and Information Quality Pub Date : 2023-01-25 DOI:10.1145/3561389

Dominik Stammbach, Boya Zhang, Elliott Ash

引用次数: 1

Abstract

Automated claim checking is the task of determining the veracity of a claim given evidence retrieved from a textual knowledge base of trustworthy facts. While previous work has taken the knowledge base as given and optimized the claim-checking pipeline, we take the opposite approach—taking the pipeline as given, we explore the choice of the knowledge base. Our first insight is that a claim-checking pipeline can be transferred to a new domain of claims with access to a knowledge base from the new domain. Second, we do not find a “universally best” knowledge base—higher domain overlap of a task dataset and a knowledge base tends to produce better label accuracy. Third, combining multiple knowledge bases does not tend to improve performance beyond using the closest-domain knowledge base. Finally, we show that the claim-checking pipeline’s confidence score for selecting evidence can be used to assess whether a knowledge base will perform well for a new set of claims, even in the absence of ground-truth labels.

查看原文本刊更多论文

自动索赔检查中文本知识库的选择

自动索赔检查的任务是从可信事实的文本知识库中检索证据，确定索赔的真实性。先前的工作是将知识库作为给定的，并对索赔检查管道进行优化，而我们采取相反的方法——将管道作为给定的，我们探索知识库的选择。我们的第一个见解是，索赔检查管道可以被转移到一个新的索赔领域，并从新领域访问知识库。其次，我们没有找到一个“普遍最佳”的知识库-任务数据集和知识库的高域重叠往往会产生更好的标签准确性。第三，除了使用最接近领域的知识库之外，组合多个知识库并不倾向于提高性能。最后，我们证明了索赔检查管道选择证据的置信度得分可以用来评估知识库是否会在一组新的索赔中表现良好，即使在没有基本事实标签的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

4.10

自引率

4.80%

发文量