A semantic approach for sarcasm identification for preventing fake news spreading on social networks

Fethi Fkih, Delel Rhouma, Hajar Alghofaily
{"title":"A semantic approach for sarcasm identification for preventing fake news spreading on social networks","authors":"Fethi Fkih, Delel Rhouma, Hajar Alghofaily","doi":"10.1007/s41870-024-02156-7","DOIUrl":null,"url":null,"abstract":"<p>Misinterpreting satirical posts can contribute to the spread of misinformation and potentially be a source of what is commonly referred to as “fake news”. Satire is a form of humor that often involves exaggeration, irony, or ridicule to comment on or criticize a particular subject. While satirical content is not intended to be taken literally, there are instances where individuals may misinterpret it, leading to the dissemination of false information. In fact, we can reduce the spread of fake news by preventing people from misinterpreting satirical posts. However, sarcasm recognition is considered a challenging task in the Sentiment Analysis domain. Even for humans, it can be difficult to recognize irony and sarcasm, which conveys a sharp, bitter remark or criticism in ambiguous and unclear natural language. This makes the identification much more difficult for an automated model. In this paper, we have carried out an in-depth literature review about the main approaches used for sarcasm detection and especially those based on Machine Learning (ML) models. Then, a study was conducted with a series of binary classification models that exploit a variety of statistical and semantic features. Our experiments have been carried out on twitter dataset obtained from SemEval-2018 Task 3. An extensive evaluation of each set of classifiers demonstrates the efficiency of our proposed model in detecting and identifying sarcastic content in tweets. Finally, we compared the performance of machine learning models using our proposed features with our baseline and state-of-the-art on the same dataset. By using Support Vector Machine (SVM) model and the proposed features, we outperform the state-of-the-art and we obtained an accuracy of 79.46% with a F-score equal to 79.66% which considered a promising result in this field.</p>","PeriodicalId":14138,"journal":{"name":"International Journal of Information Technology","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-024-02156-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Misinterpreting satirical posts can contribute to the spread of misinformation and potentially be a source of what is commonly referred to as “fake news”. Satire is a form of humor that often involves exaggeration, irony, or ridicule to comment on or criticize a particular subject. While satirical content is not intended to be taken literally, there are instances where individuals may misinterpret it, leading to the dissemination of false information. In fact, we can reduce the spread of fake news by preventing people from misinterpreting satirical posts. However, sarcasm recognition is considered a challenging task in the Sentiment Analysis domain. Even for humans, it can be difficult to recognize irony and sarcasm, which conveys a sharp, bitter remark or criticism in ambiguous and unclear natural language. This makes the identification much more difficult for an automated model. In this paper, we have carried out an in-depth literature review about the main approaches used for sarcasm detection and especially those based on Machine Learning (ML) models. Then, a study was conducted with a series of binary classification models that exploit a variety of statistical and semantic features. Our experiments have been carried out on twitter dataset obtained from SemEval-2018 Task 3. An extensive evaluation of each set of classifiers demonstrates the efficiency of our proposed model in detecting and identifying sarcastic content in tweets. Finally, we compared the performance of machine learning models using our proposed features with our baseline and state-of-the-art on the same dataset. By using Support Vector Machine (SVM) model and the proposed features, we outperform the state-of-the-art and we obtained an accuracy of 79.46% with a F-score equal to 79.66% which considered a promising result in this field.

Abstract Image

识别讽刺的语义方法,防止假新闻在社交网络上传播
误读讽刺文章会助长错误信息的传播,并有可能成为通常所说的 "假新闻 "的来源。讽刺是一种幽默形式,通常通过夸张、讽刺或调侃来评论或批评某一特定主题。虽然讽刺内容并不是要从字面上理解,但在某些情况下,个人可能会对其进行误读,从而导致虚假信息的传播。事实上,我们可以通过防止人们误读讽刺文章来减少假新闻的传播。然而,在情感分析领域,讽刺识别被认为是一项具有挑战性的任务。讽刺和挖苦用含糊不清的自然语言表达了尖锐、尖刻的评论或批评,即使是人类也很难识别讽刺和挖苦。这就增加了自动模型识别的难度。在本文中,我们对用于讽刺检测的主要方法,尤其是基于机器学习(ML)模型的方法进行了深入的文献综述。然后,我们使用一系列利用各种统计和语义特征的二元分类模型进行了研究。我们的实验是在 SemEval-2018 任务 3 获得的 twitter 数据集上进行的。对每组分类器的广泛评估都证明了我们提出的模型在检测和识别推文中讽刺内容方面的效率。最后,我们比较了在相同数据集上使用我们提出的特征的机器学习模型与我们的基线模型和最先进模型的性能。通过使用支持向量机(SVM)模型和所提出的特征,我们的表现优于最先进的模型,准确率达到 79.46%,F-score 等于 79.66%,这在该领域是一个很有前途的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信