A Comparative Analysis of Classification Algorithms for Cyberbullying Crime Detection: An Experimental Study of Twitter Social Media in Indonesia

Scientific Journal of Informatics Pub Date : 2022-10-17 DOI:10.15294/sji.v9i2.35149

A. Muzakir, Hadi Syaputra, F. Panjaitan

{"title":"A Comparative Analysis of Classification Algorithms for Cyberbullying Crime Detection: An Experimental Study of Twitter Social Media in Indonesia","authors":"A. Muzakir, Hadi Syaputra, F. Panjaitan","doi":"10.15294/sji.v9i2.35149","DOIUrl":null,"url":null,"abstract":"Purpose: This research aims to identify content that contains cyberbullying on Twitter. We also conducted a comparative study of several classification algorithms, namely NB, DT, LR, and SVM. The dataset we use comes from Twitter data which is then manually labeled and validated by language experts. This study used 1065 data with a label distribution, namely 638 data with a non-bullying label and 427 with a bullying label.Methods: The weighting process for each word uses the bag of word (BOW) method, which uses three weighting features. The three-word vector weighting features used include unigram, bigram, and trigram. The experiment was conducted with two scenarios, namely testing to find the best accuracy value with the three features. The following scenario looks at the overall comparison of the algorithm's performance against all the features used.Result: The experimental results show that for the measurement of accuracy weighting based on features and algorithms, the SVM classification algorithm outperformed other algorithms with a percentage of 76%. Then for the weighting based on the average recall, the DT classification algorithm outperformed the other algorithms by an average of 76%. Another test for measuring overall performance (F-measure) based on accuracy and precision, the SVM classification algorithm, managed to outperform other algorithms with an F-measure of 82%.Value: Based on several experiments conducted, the SVM classification algorithm can detect words containing cyberbullying on social media.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v9i2.35149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Purpose: This research aims to identify content that contains cyberbullying on Twitter. We also conducted a comparative study of several classification algorithms, namely NB, DT, LR, and SVM. The dataset we use comes from Twitter data which is then manually labeled and validated by language experts. This study used 1065 data with a label distribution, namely 638 data with a non-bullying label and 427 with a bullying label.Methods: The weighting process for each word uses the bag of word (BOW) method, which uses three weighting features. The three-word vector weighting features used include unigram, bigram, and trigram. The experiment was conducted with two scenarios, namely testing to find the best accuracy value with the three features. The following scenario looks at the overall comparison of the algorithm's performance against all the features used.Result: The experimental results show that for the measurement of accuracy weighting based on features and algorithms, the SVM classification algorithm outperformed other algorithms with a percentage of 76%. Then for the weighting based on the average recall, the DT classification algorithm outperformed the other algorithms by an average of 76%. Another test for measuring overall performance (F-measure) based on accuracy and precision, the SVM classification algorithm, managed to outperform other algorithms with an F-measure of 82%.Value: Based on several experiments conducted, the SVM classification algorithm can detect words containing cyberbullying on social media.

查看原文本刊更多论文

网络欺凌犯罪侦查分类算法的比较分析——基于印尼Twitter社交媒体的实验研究

目的：本研究旨在识别推特上包含网络欺凌的内容。我们还对几种分类算法进行了比较研究，即NB、DT、LR和SVM。我们使用的数据集来自Twitter数据，然后由语言专家手动标记和验证。本研究使用了1065个具有标签分布的数据，即638个数据具有非欺凌标签，427个数据带有欺凌标签。方法：每个单词的加权过程使用单词袋法，该方法使用三个加权特征。所使用的三个单词矢量加权特征包括一元、二元和三元。实验在两种情况下进行，即测试以找到具有三个特征的最佳准确度值。下面的场景着眼于算法性能与所使用的所有功能的总体比较。结果：实验结果表明，在基于特征和算法的准确度加权测量中，SVM分类算法的准确率优于其他算法，达到76%。然后，对于基于平均召回率的加权，DT分类算法的平均性能优于其他算法76%。另一项基于准确性和精密度衡量整体性能（F-measure）的测试，SVM分类算法，成功地以82%的F-度量优于其他算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Journal of Informatics

自引率

0.00%

发文量

审稿时长

24 weeks