Spam Classification on 2019 Indonesian President Election Youtube Comments Using Multinomial Naïve-Bayes

Indonesian Journal of Artificial Intelligence and Data Mining Pub Date : 2019-07-08 DOI:10.24014/ijaidm.v2i1.6445

Jonathan Radot Fernando, R. Budiraharjo, Emeraldi Haganusa

引用次数: 1

Abstract

Text classification are used in many aspect of technologies such as spam classification, news categorization, Auto-correct texting. One of the most popular algorithm for text classification nowadays is Multinomial Naïve-Bayes. This paper explained how Naïve-Bayes assumption method works to classify 2019 Indonesian Election Youtube comments. The output prediction of this algorithm is spam or not spam. Spam messages are defined as racist comments, advertising comments, and unsolicited comments. The algorithms text representation method used bag-of-words method. Bag-of-words method defined a text as the multiset of its words. The algorithm then calculate the probability of a word given the class of spam or not spam. The main difference between normal Naïve-Bayes algorithm and Multinomial Naïve-Bayes is the way the algorithm treats the data itself. Multinomial Naïve-Bayes treats data as a frequency data hence it is suitable for text classification task.

查看原文本刊更多论文

利用多项式对2019年印尼总统选举Youtube评论进行垃圾邮件分类Naïve-Bayes

文本分类被用于许多方面的技术，如垃圾邮件分类，新闻分类，自动纠错文本。目前最流行的文本分类算法之一是Multinomial Naïve-Bayes。本文解释了Naïve-Bayes假设方法如何对2019年印度尼西亚选举Youtube评论进行分类。该算法的输出预测是垃圾邮件还是非垃圾邮件。垃圾邮件被定义为种族主义评论、广告评论和未经请求的评论。算法的文本表示方法采用词袋法。词袋方法将文本定义为其词的多集。然后，该算法计算给定垃圾邮件或非垃圾邮件类别的单词的概率。普通Naïve-Bayes算法和多项式Naïve-Bayes算法的主要区别在于算法处理数据本身的方式。多项式Naïve-Bayes将数据视为频率数据，因此适用于文本分类任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Indonesian Journal of Artificial Intelligence and Data Mining

自引率

0.00%

发文量