Hoax News Detection Using Passive Aggressive Classifier And TfidfVectorizer

Maulana Fajar Lazuardi, Renaldy Hiunarto, Kareena Futri Ramadhani, Noviandi Noviandi, Riya Widayanti, Muhamad Hadi Arfian
{"title":"Hoax News Detection Using Passive Aggressive Classifier And TfidfVectorizer","authors":"Maulana Fajar Lazuardi, Renaldy Hiunarto, Kareena Futri Ramadhani, Noviandi Noviandi, Riya Widayanti, Muhamad Hadi Arfian","doi":"10.15408/jti.v16i2.34084","DOIUrl":null,"url":null,"abstract":"Indonesia is one of the countries with the highest number of social media users. Million social media users in Indonesia reached 167 million in January 2023. These users are spread, across various social media, including Twitter with 24 million users. The high number of social media users on Twitter makes the information validation process even more neglected. Moreover, the trend of news interest read by social media users is only adjusted to their individual tastes. This phenomenon is evidenced by the large number of fake news (hoaxes) circulating in society which are spread through social media. Therefore, an accurate machine learning model is needed to classify \"real\" and \"hoax\" news. This study uses the TfidfVectorizer algorithm and Passive Aggressive Classifier for datasets that are shared through the Kaggle site. The contents of the dataset were sourced via social media Twitter over a span of 5 years, namely 2015-2020. At the preprocessing stage to making the Confusion Matrix, the machine learning model shows that it can work well as expected, namely getting Accuracy, Precision, and Recall scores of 82.44%, 80.66%, and 82.44%. In addition, the results of the confusion matrix show that in the dataset used, there is more \"real\" news than \"hoaxes\", that is, the model is able to predict 1059 real news and 211 hoax news, with actual conditions 1106 real news and 164 hoax news.","PeriodicalId":506287,"journal":{"name":"JURNAL TEKNIK INFORMATIKA","volume":"132 20","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JURNAL TEKNIK INFORMATIKA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15408/jti.v16i2.34084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Indonesia is one of the countries with the highest number of social media users. Million social media users in Indonesia reached 167 million in January 2023. These users are spread, across various social media, including Twitter with 24 million users. The high number of social media users on Twitter makes the information validation process even more neglected. Moreover, the trend of news interest read by social media users is only adjusted to their individual tastes. This phenomenon is evidenced by the large number of fake news (hoaxes) circulating in society which are spread through social media. Therefore, an accurate machine learning model is needed to classify "real" and "hoax" news. This study uses the TfidfVectorizer algorithm and Passive Aggressive Classifier for datasets that are shared through the Kaggle site. The contents of the dataset were sourced via social media Twitter over a span of 5 years, namely 2015-2020. At the preprocessing stage to making the Confusion Matrix, the machine learning model shows that it can work well as expected, namely getting Accuracy, Precision, and Recall scores of 82.44%, 80.66%, and 82.44%. In addition, the results of the confusion matrix show that in the dataset used, there is more "real" news than "hoaxes", that is, the model is able to predict 1059 real news and 211 hoax news, with actual conditions 1106 real news and 164 hoax news.
使用被动攻击型分类器和 TfidfVectorizer 检测虚假新闻
印度尼西亚是社交媒体用户数量最多的国家之一。2023 年 1 月,印尼的社交媒体用户达到 1.67 亿。这些用户分布在各种社交媒体上,包括拥有 2400 万用户的 Twitter。推特上的社交媒体用户数量之多,使得信息验证过程更加被忽视。此外,社交媒体用户阅读新闻的兴趣趋势只是根据个人口味进行调整。通过社交媒体传播的大量假新闻(骗局)在社会上流传就是这种现象的明证。因此,需要一个准确的机器学习模型来对 "真实 "和 "骗局 "新闻进行分类。本研究使用 TfidfVectorizer 算法和被动攻击分类器来处理通过 Kaggle 网站共享的数据集。数据集的内容来自社交媒体 Twitter,时间跨度为 5 年,即 2015-2020 年。在制作混淆矩阵的预处理阶段,机器学习模型表现出了预期的良好效果,即准确率、精确率和召回率分别达到了 82.44%、80.66% 和 82.44%。此外,混淆矩阵的结果表明,在所使用的数据集中,"真实 "新闻多于 "骗局 "新闻,即模型能够预测出 1059 条真实新闻和 211 条骗局新闻,实际情况是预测出 1106 条真实新闻和 164 条骗局新闻。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信