A Multilingual Dataset for Identification of Factual Claims in Indian Twitter

Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation Pub Date : 2022-12-09 DOI:10.1145/3574318.3574348

Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das

引用次数: 1

Abstract

The need for automated fact-checking is getting prominent with every passing day as the spread of misinformation is swelling over the ever-increasing stream of online content. We focus on fine-grained labelling of factual information in tweets to facilitate better fact-checking systems capable of providing improved justifications. In this paper, we present a token-level annotation of factual claims in tweets from Indian Twitter. To deal with the multilingual variety of the Indian diaspora, we deal with tweets in English, Bengali, Hindi, and their codemixed variants. To the best of our knowledge, this dataset is first of kind, both in terms of labelling scheme as well as data sources.

查看原文本刊更多论文

印度推特中事实声明识别的多语言数据集

随着错误信息的传播在不断增加的在线内容流中膨胀，对自动事实核查的需求日益突出。我们专注于对推文中的事实信息进行细粒度标记，以促进能够提供改进的理由的更好的事实核查系统。在本文中，我们提出了一个令牌级注释的事实主张来自印度推特。为了处理印度侨民的多语言多样性，我们处理英语，孟加拉语，印地语及其编码混合变体的推文。据我们所知，这个数据集在标签方案和数据源方面都是首创的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation

自引率

0.00%

发文量