Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das
{"title":"A Multilingual Dataset for Identification of Factual Claims in Indian Twitter","authors":"Subhabrata Dutta, Rudra Dhar, Prantik Guha, Arpan Murmu, Dipankar Das","doi":"10.1145/3574318.3574348","DOIUrl":null,"url":null,"abstract":"The need for automated fact-checking is getting prominent with every passing day as the spread of misinformation is swelling over the ever-increasing stream of online content. We focus on fine-grained labelling of factual information in tweets to facilitate better fact-checking systems capable of providing improved justifications. In this paper, we present a token-level annotation of factual claims in tweets from Indian Twitter. To deal with the multilingual variety of the Indian diaspora, we deal with tweets in English, Bengali, Hindi, and their codemixed variants. To the best of our knowledge, this dataset is first of kind, both in terms of labelling scheme as well as data sources.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3574318.3574348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The need for automated fact-checking is getting prominent with every passing day as the spread of misinformation is swelling over the ever-increasing stream of online content. We focus on fine-grained labelling of factual information in tweets to facilitate better fact-checking systems capable of providing improved justifications. In this paper, we present a token-level annotation of factual claims in tweets from Indian Twitter. To deal with the multilingual variety of the Indian diaspora, we deal with tweets in English, Bengali, Hindi, and their codemixed variants. To the best of our knowledge, this dataset is first of kind, both in terms of labelling scheme as well as data sources.