On the use of text augmentation for stance and fake news detection

IF 2.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ilhem Salah, Khaled Jouini, O. Korbaa
{"title":"On the use of text augmentation for stance and fake news detection","authors":"Ilhem Salah, Khaled Jouini, O. Korbaa","doi":"10.1080/24751839.2023.2198820","DOIUrl":null,"url":null,"abstract":"ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.","PeriodicalId":32180,"journal":{"name":"Journal of Information and Telecommunication","volume":"15 6","pages":"359 - 375"},"PeriodicalIF":2.7000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Telecommunication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24751839.2023.2198820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT Data Augmentation (DA) aims at synthesizing new training instances by applying transformations to available ones. DA has several well-known benefits such as: (i) increasing generalization ability; (ii) preventing data scarcity; and (iii) helping resolve class imbalance issues. In this work, we investigate the use of DA for stance and fake news detection. In the first part of our work, we explore the effect of various DA techniques on the performance of common classification algorithms. Our study reveals that the motto ‘the more, the better’ is the wrong approach regarding text augmentation and that there is no one-size-fits-all text augmentation technique. The second part of our work leverages the results of our study to propose a novel augmentation-based, ensemble learning approach. The proposed approach leverages text augmentation to enhance base learners' diversity and accuracy, ergo the predictive performance of the ensemble. The third part of our work experimentally investigates the use of DA to cope with the class imbalance problem. Class imbalance is very common in stance and fake news detection and often results in biased models. In this work we show how and to what extent text augmentation can help resolving moderate and severe imbalance.
关于文本增强在姿态和假新闻检测中的应用
数据增强(Data Augmentation, DA)旨在通过对已有的训练实例进行转换来合成新的训练实例。数据分析有几个众所周知的好处,例如:(i)提高泛化能力;(ii)防止数据短缺;(三)帮助解决阶级失衡问题。在这项工作中,我们研究了数据处理在姿态和假新闻检测中的应用。在我们工作的第一部分中,我们探讨了各种数据处理技术对常用分类算法性能的影响。我们的研究表明,“越多越好”的座右铭是关于文本增强的错误方法,并且没有一种适用于所有文本增强的技术。我们工作的第二部分利用我们的研究结果提出了一种新的基于增强的集成学习方法。提出的方法利用文本增强来提高基础学习者的多样性和准确性,从而提高集成的预测性能。第三部分实验研究了数据挖掘在处理类不平衡问题中的应用。阶级不平衡在立场和假新闻检测中非常普遍,并且经常导致有偏见的模型。在这项工作中,我们展示了文本增强如何以及在多大程度上可以帮助解决中度和严重的不平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
0.00%
发文量
18
审稿时长
27 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信