Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF

Ichwanul Muslim Karo Karo, Romia Romia, Sri Dewi, P. Fadilah
{"title":"Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF","authors":"Ichwanul Muslim Karo Karo, Romia Romia, Sri Dewi, P. Fadilah","doi":"10.47065/josh.v4i3.3317","DOIUrl":null,"url":null,"abstract":"Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.","PeriodicalId":233506,"journal":{"name":"Journal of Information System Research (JOSH)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information System Research (JOSH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47065/josh.v4i3.3317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.
使用Naïve与TF-IDF的贝叶斯分类器对印度尼西亚推文的恶作剧检测
Twitter是当今世界上最受欢迎的社交媒体平台之一。印度尼西亚的Twitter用户是世界第五大用户,他们总是积极地通过Twitter来表达自己和获取信息。骗局是编造出来的好像是真的谎言。恶作剧也经常通过推特传播。恶作剧的传播是极其危险的,因为它会引起社会不和,甚至误解。因此,必须抵制骗局。这项研究旨在建立一个系统来检测印尼推特上的恶作剧。本研究的目的是通过使用Naïve具有Term Frequency Inverse Document Frequency (TF-IDF)的贝叶斯分类器来识别印度尼西亚的恶作剧推文。本研究收集并注释了用户账户发送的恶作剧推文。本研究还应用了几种文本预处理技术来提供数据集。为了提供最佳的恶作剧预测模型,本工作将数据集分为训练和测试数据集。有四种实验场景涉及拆分数据集。实验结果表明,Naïve贝叶斯与TF-IDF的恶作剧预测模型准确率和召回率分别为64%,准确率为69%和67%,得分为f1。当使用Naïve贝叶斯分类器而不使用TF-IDF时,该结果也优于恶作剧预测模型。这意味着TF-IDF对提高模型性能做出了积极贡献。最后,本研究通过检测具有恶作剧倾向的新闻并过滤分类为恶作剧或非恶作剧的新闻来做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信