Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text

Prerona Tarannum, Firoj Alam, Md. Arid Hasan, S. R. H. Noori
{"title":"Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text","authors":"Prerona Tarannum, Firoj Alam, Md. Arid Hasan, S. R. H. Noori","doi":"10.48550/arXiv.2207.07308","DOIUrl":null,"url":null,"abstract":"The wide use of social media and digital technologies facilitates sharing various news and information about events and activities. Despite sharing positive information misleading and false information is also spreading on social media. There have been efforts in identifying such misleading information both manually by human experts and automatic tools. Manual effort does not scale well due to the high volume of information, containing factual claims, are appearing online. Therefore, automatically identifying check-worthy claims can be very useful for human experts. In this study, we describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not. We use the oversampling technique to balance the dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official submissions and our systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English, respectively. In further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and English languages where a different scenario is observed for Spanish.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference and Labs of the Evaluation Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.07308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The wide use of social media and digital technologies facilitates sharing various news and information about events and activities. Despite sharing positive information misleading and false information is also spreading on social media. There have been efforts in identifying such misleading information both manually by human experts and automatic tools. Manual effort does not scale well due to the high volume of information, containing factual claims, are appearing online. Therefore, automatically identifying check-worthy claims can be very useful for human experts. In this study, we describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not. We use the oversampling technique to balance the dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official submissions and our systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English, respectively. In further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and English languages where a different scenario is observed for Spanish.
在CheckThat!实验室2022:推特文本的可靠性识别
社交媒体和数字技术的广泛使用促进了各种新闻和信息的分享。尽管分享了积极的信息,但在社交媒体上,误导和虚假信息也在传播。人们一直在努力通过人工专家和自动工具来识别这类误导性信息。由于网上出现了大量包含事实主张的信息,人工工作无法很好地扩展。因此,自动识别值得检查的索赔对人类专家来说非常有用。在这项研究中,我们描述了我们参与子任务1a: CheckThat!的推文(英语,荷兰语和西班牙语)的检查性。CLEF 2022实验室。我们执行标准的预处理步骤,并应用不同的模型来确定给定文本是否值得进行事实检查。我们使用过采样技术来平衡数据集,并将SVM和随机森林(RF)与TF-IDF表示结合使用。我们还使用了BERT多语言(BERT-m)和基于xlm - roberta的预训练模型进行实验。我们使用BERT-m进行正式提交,我们的系统在西班牙语、荷兰语和英语中分别排名第3、5和12位。在进一步的实验中,我们的评估表明,变压器模型(BERT-m和XLM-RoBERTa-base)在荷兰语和英语中的表现优于SVM和RF,而在西班牙语中观察到不同的场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信