A Multilingual Dataset for Identification of Factual Claims in Indian Twitter

Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das
{"title":"A Multilingual Dataset for Identification of Factual Claims in Indian Twitter","authors":"Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das","doi":"10.1145/3574318.3574348","DOIUrl":null,"url":null,"abstract":"The need for automated fact-checking is getting prominent with every passing day as the spread of misinformation is swelling over the ever-increasing stream of online content. We focus on fine-grained labelling of factual information in tweets to facilitate better fact-checking systems capable of providing improved justifications. In this paper, we present a token-level annotation of factual claims in tweets from Indian Twitter. To deal with the multilingual variety of the Indian diaspora, we deal with tweets in English, Bengali, Hindi, and their codemixed variants. To the best of our knowledge, this dataset is first of kind, both in terms of labelling scheme as well as data sources.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3574318.3574348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The need for automated fact-checking is getting prominent with every passing day as the spread of misinformation is swelling over the ever-increasing stream of online content. We focus on fine-grained labelling of factual information in tweets to facilitate better fact-checking systems capable of providing improved justifications. In this paper, we present a token-level annotation of factual claims in tweets from Indian Twitter. To deal with the multilingual variety of the Indian diaspora, we deal with tweets in English, Bengali, Hindi, and their codemixed variants. To the best of our knowledge, this dataset is first of kind, both in terms of labelling scheme as well as data sources.
印度推特中事实声明识别的多语言数据集
随着错误信息的传播在不断增加的在线内容流中膨胀,对自动事实核查的需求日益突出。我们专注于对推文中的事实信息进行细粒度标记,以促进能够提供改进的理由的更好的事实核查系统。在本文中,我们提出了一个令牌级注释的事实主张来自印度推特。为了处理印度侨民的多语言多样性,我们处理英语,孟加拉语,印地语及其编码混合变体的推文。据我们所知,这个数据集在标签方案和数据源方面都是首创的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信