Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes

Jonathan Radot Fernando, R. Budiraharjo, Emeraldi Haganusa
{"title":"Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes","authors":"Jonathan Radot Fernando, R. Budiraharjo, Emeraldi Haganusa","doi":"10.24014/ijaidm.v2i1.6445","DOIUrl":null,"url":null,"abstract":"Text classification are used in many aspect of technologies such as spam classification, news categorization, Auto-correct texting. One of the most popular algorithm for text classification nowadays is Multinomial Naïve-Bayes. This paper explained how Naïve-Bayes assumption method works to classify 2019 Indonesian Election Youtube comments. The output prediction of this algorithm is spam or not spam. Spam messages are defined as racist comments, advertising comments, and unsolicited comments. The algorithms text representation method used bag-of-words method. Bag-of-words method defined a text as the multiset of its words. The algorithm then calculate the probability of a word given the class of spam or not spam. The main difference between normal Naïve-Bayes algorithm and Multinomial Naïve-Bayes is the way the algorithm treats the data itself. Multinomial Naïve-Bayes treats data as a frequency data hence it is suitable for text classification task.","PeriodicalId":385582,"journal":{"name":"Indonesian Journal of Artificial Intelligence and Data Mining","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Artificial Intelligence and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24014/ijaidm.v2i1.6445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Text classification are used in many aspect of technologies such as spam classification, news categorization, Auto-correct texting. One of the most popular algorithm for text classification nowadays is Multinomial Naïve-Bayes. This paper explained how Naïve-Bayes assumption method works to classify 2019 Indonesian Election Youtube comments. The output prediction of this algorithm is spam or not spam. Spam messages are defined as racist comments, advertising comments, and unsolicited comments. The algorithms text representation method used bag-of-words method. Bag-of-words method defined a text as the multiset of its words. The algorithm then calculate the probability of a word given the class of spam or not spam. The main difference between normal Naïve-Bayes algorithm and Multinomial Naïve-Bayes is the way the algorithm treats the data itself. Multinomial Naïve-Bayes treats data as a frequency data hence it is suitable for text classification task.
利用多项式对2019年印尼总统选举Youtube评论进行垃圾邮件分类Naïve-Bayes
文本分类被用于许多方面的技术,如垃圾邮件分类,新闻分类,自动纠错文本。目前最流行的文本分类算法之一是Multinomial Naïve-Bayes。本文解释了Naïve-Bayes假设方法如何对2019年印度尼西亚选举Youtube评论进行分类。该算法的输出预测是垃圾邮件还是非垃圾邮件。垃圾邮件被定义为种族主义评论、广告评论和未经请求的评论。算法的文本表示方法采用词袋法。词袋方法将文本定义为其词的多集。然后,该算法计算给定垃圾邮件或非垃圾邮件类别的单词的概率。普通Naïve-Bayes算法和多项式Naïve-Bayes算法的主要区别在于算法处理数据本身的方式。多项式Naïve-Bayes将数据视为频率数据,因此适用于文本分类任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信