Recurrent neural network based multiclass cyber bullying classification

Silvia Sifath , Tania Islam , Md Erfan , Samrat Kumar Dey , MD. Minhaj Ul Islam , Md Samsuddoha , Tazizur Rahman
{"title":"Recurrent neural network based multiclass cyber bullying classification","authors":"Silvia Sifath ,&nbsp;Tania Islam ,&nbsp;Md Erfan ,&nbsp;Samrat Kumar Dey ,&nbsp;MD. Minhaj Ul Islam ,&nbsp;Md Samsuddoha ,&nbsp;Tazizur Rahman","doi":"10.1016/j.nlp.2024.100111","DOIUrl":null,"url":null,"abstract":"<div><div>Cyberbullying is one of the crimes that arise rapidly through the daily use of technology by different types of people and, most notably, by sharing one’s opinions or feelings on social media in a harmful manner. It has several negative effects on society such as depression, anxiety, suicide, and so on. At the same time, it reduces productivity, causes psychological damage that can last a lifetime and increases violence among people. To prevent cyberbullying or take necessary steps against the harasser, the first step is to detect cyberbullying. Several works exist to detect and classify cyberbullying but a few works have been carried out to classify cyberbullying in the Bengali Language. As the number of people is increased day by day who communicate on social media using the Bengali language, it is crucial to address this situation and improve both accuracy and robustness to detect and classify cyberbullying. For this purpose, we propose an NLP-based model using machine learning and deep learning algorithms to detect and classify Bengali comments on social media. This research specifies cyberbullying comments using a multiclass classification strategy. Kaggle and Melany are used to collect the dataset to train and evaluate our model. The dataset contains 56308 Bengali comments, consisting of four distinct categories. The categories are not bully, trolls, sexual, and threats. We use different machine learning algorithms such as Support Vector Machine, Logistic Regression, Random Forest, XGBOOST, Multinomial Naïve Bayes, Deep learning algorithm, Recurrent Neural Network (RNN), and two fusion models. Along with that effective preprocessing steps are implemented to get a suitable dataset. In this study, the Recurrent Neural Network gives the best accuracy, which is 86%. The accuracy of our model is good enough to help social media users and encourage them to practice morality.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"9 ","pages":"Article 100111"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cyberbullying is one of the crimes that arise rapidly through the daily use of technology by different types of people and, most notably, by sharing one’s opinions or feelings on social media in a harmful manner. It has several negative effects on society such as depression, anxiety, suicide, and so on. At the same time, it reduces productivity, causes psychological damage that can last a lifetime and increases violence among people. To prevent cyberbullying or take necessary steps against the harasser, the first step is to detect cyberbullying. Several works exist to detect and classify cyberbullying but a few works have been carried out to classify cyberbullying in the Bengali Language. As the number of people is increased day by day who communicate on social media using the Bengali language, it is crucial to address this situation and improve both accuracy and robustness to detect and classify cyberbullying. For this purpose, we propose an NLP-based model using machine learning and deep learning algorithms to detect and classify Bengali comments on social media. This research specifies cyberbullying comments using a multiclass classification strategy. Kaggle and Melany are used to collect the dataset to train and evaluate our model. The dataset contains 56308 Bengali comments, consisting of four distinct categories. The categories are not bully, trolls, sexual, and threats. We use different machine learning algorithms such as Support Vector Machine, Logistic Regression, Random Forest, XGBOOST, Multinomial Naïve Bayes, Deep learning algorithm, Recurrent Neural Network (RNN), and two fusion models. Along with that effective preprocessing steps are implemented to get a suitable dataset. In this study, the Recurrent Neural Network gives the best accuracy, which is 86%. The accuracy of our model is good enough to help social media users and encourage them to practice morality.
基于循环神经网络的多类网络欺凌分类
网络欺凌是通过不同类型的人对技术的日常使用而迅速产生的犯罪之一,最明显的是通过在社交媒体上以有害的方式分享自己的观点或感受。它对社会产生了一些负面影响,如抑郁、焦虑、自杀等。同时,它还会降低工作效率,造成可能持续一生的心理伤害,并增加人与人之间的暴力行为。要预防网络欺凌或对骚扰者采取必要措施,首先要检测网络欺凌。目前已有多部作品对网络欺凌进行检测和分类,但用孟加拉语对网络欺凌进行分类的作品为数不多。随着使用孟加拉语在社交媒体上交流的人数与日俱增,解决这一问题并提高检测和分类网络欺凌的准确性和鲁棒性至关重要。为此,我们提出了一种基于 NLP 的模型,使用机器学习和深度学习算法来检测和分类社交媒体上的孟加拉语评论。本研究采用多类分类策略对网络欺凌评论进行分类。我们使用 Kaggle 和 Melany 收集数据集来训练和评估我们的模型。数据集包含 56308 条孟加拉语评论,由四个不同的类别组成。这四个类别分别是 "非恶霸"、"巨魔"、"性 "和 "威胁"。我们使用了不同的机器学习算法,如支持向量机、逻辑回归、随机森林、XGBOOST、多项式奈夫贝叶斯、深度学习算法、循环神经网络(RNN)和两个融合模型。此外,还实施了有效的预处理步骤,以获得合适的数据集。在这项研究中,循环神经网络的准确率最高,达到了 86%。我们模型的准确率足以帮助社交媒体用户并鼓励他们践行道德。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信