使用Naïve与TF-IDF的贝叶斯分类器对印度尼西亚推文的恶作剧检测

Journal of Information System Research (JOSH) Pub Date : 2023-04-30 DOI:10.47065/josh.v4i3.3317

Ichwanul Muslim Karo Karo, Romia Romia, Sri Dewi, P. Fadilah

{"title":"使用Naïve与TF-IDF的贝叶斯分类器对印度尼西亚推文的恶作剧检测","authors":"Ichwanul Muslim Karo Karo, Romia Romia, Sri Dewi, P. Fadilah","doi":"10.47065/josh.v4i3.3317","DOIUrl":null,"url":null,"abstract":"Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.","PeriodicalId":233506,"journal":{"name":"Journal of Information System Research (JOSH)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF\",\"authors\":\"Ichwanul Muslim Karo Karo, Romia Romia, Sri Dewi, P. Fadilah\",\"doi\":\"10.47065/josh.v4i3.3317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.\",\"PeriodicalId\":233506,\"journal\":{\"name\":\"Journal of Information System Research (JOSH)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information System Research (JOSH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47065/josh.v4i3.3317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information System Research (JOSH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47065/josh.v4i3.3317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

Twitter是当今世界上最受欢迎的社交媒体平台之一。印度尼西亚的Twitter用户是世界第五大用户，他们总是积极地通过Twitter来表达自己和获取信息。骗局是编造出来的好像是真的谎言。恶作剧也经常通过推特传播。恶作剧的传播是极其危险的，因为它会引起社会不和，甚至误解。因此，必须抵制骗局。这项研究旨在建立一个系统来检测印尼推特上的恶作剧。本研究的目的是通过使用Naïve具有Term Frequency Inverse Document Frequency (TF-IDF)的贝叶斯分类器来识别印度尼西亚的恶作剧推文。本研究收集并注释了用户账户发送的恶作剧推文。本研究还应用了几种文本预处理技术来提供数据集。为了提供最佳的恶作剧预测模型，本工作将数据集分为训练和测试数据集。有四种实验场景涉及拆分数据集。实验结果表明，Naïve贝叶斯与TF-IDF的恶作剧预测模型准确率和召回率分别为64%，准确率为69%和67%，得分为f1。当使用Naïve贝叶斯分类器而不使用TF-IDF时，该结果也优于恶作剧预测模型。这意味着TF-IDF对提高模型性能做出了积极贡献。最后，本研究通过检测具有恶作剧倾向的新闻并过滤分类为恶作剧或非恶作剧的新闻来做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hoax Detection on Indonesian Tweets using Naïve Bayes Classifier with TF-IDF

Twitter is one of the most popular social media platforms in the world nowadays. Twitter users in Indonesia are the fifth largest in the world and are always active in expressing themselves and getting information through tweets. A hoax is a lie created as if it were true. Hoaxes are also often spread via tweets. The spread of hoaxes is extremely dangerous because it can cause social discord and even misunderstanding. Therefore, hoaxes must be resisted. This study aims to build a system to detect hoaxes on Indonesian tweets. The objective of this research is to identify hoax Indonesian tweets by using the Naïve Bayes classifier with Term Frequency Inverse Document Frequency (TF-IDF). This study collects and annotates tweets from hoax tweets post which sent by a user account. This study also applied several text preprocessing techniques to provide datasets. To provide the best hoax prediction model, this work splits datasets into training and testing datasets. There are four experimental scenarios that refer to splitting the dataset. The experimental results showed that the hoax prediction model using Naïve Bayes with TF-IDF had 64% accuracy and recall, 69% and 67% precision, and a F1-score respectively. This result is also superior to the hoax prediction model when using the Naïve Bayes classifier without the TF-IDF. It means that TF-IDF has made a positive contribution to improving model performance. Finally, this research contributes by detecting news with a proclivity for hoaxes and filtering what is classified as hoaxes or not.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information System Research (JOSH)

自引率

0.00%

发文量