Fawzya Ramadan Sayed , Eman Hassan Elnashar , Fatma A. Omara
{"title":"Cyberbullying detection in social media using natural language processing","authors":"Fawzya Ramadan Sayed , Eman Hassan Elnashar , Fatma A. Omara","doi":"10.1016/j.sciaf.2025.e02713","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, the popularity of social media has significantly increased, leading to a rise in cases of cyberbullying. Many instances of cyberbullying can be found in comments and posts on social media platforms such as Twitter, often causing significant emotional and psychological distress. Therefore, it is crucial to identify cyberbullying messages as early as possible to mitigate their impact. This paper introduces a model for detecting cyberbullying by combining Machine Learning (ML) classifiers with Natural Language Processing (NLP) techniques. The study utilizes a dataset of 39,870 Twitter posts and comments, categorized into five types of cyberbullying: religion, age, gender, ethnicity bullying, and non-cyberbullying. The proposed model aims to train ML classifiers after being processed using NLP techniques. It has been implemented using five ML classifiers; Random Forest, Support Vector Machine, Logistic Regression, Naïve Bayes, and K-Nearest Neighbor. According to the implementation results, it is found that Random Forest classifier, Support Vector Machine classifier, Logistic Regression classifier, Naive-Bayes classifier, and K-Nearest Neighbor classifier achieve accuracy rates of 94 %, 93 %, 92 %, 92 %, and 73 % respectively. Therefore, Random Forest classifier achieves the highest accuracy and performs better than other classifiers. In contrast, K-Nearest Neighbor classifier achieves the lowest accuracy.</div></div>","PeriodicalId":21690,"journal":{"name":"Scientific African","volume":"28 ","pages":"Article e02713"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific African","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468227625001838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, the popularity of social media has significantly increased, leading to a rise in cases of cyberbullying. Many instances of cyberbullying can be found in comments and posts on social media platforms such as Twitter, often causing significant emotional and psychological distress. Therefore, it is crucial to identify cyberbullying messages as early as possible to mitigate their impact. This paper introduces a model for detecting cyberbullying by combining Machine Learning (ML) classifiers with Natural Language Processing (NLP) techniques. The study utilizes a dataset of 39,870 Twitter posts and comments, categorized into five types of cyberbullying: religion, age, gender, ethnicity bullying, and non-cyberbullying. The proposed model aims to train ML classifiers after being processed using NLP techniques. It has been implemented using five ML classifiers; Random Forest, Support Vector Machine, Logistic Regression, Naïve Bayes, and K-Nearest Neighbor. According to the implementation results, it is found that Random Forest classifier, Support Vector Machine classifier, Logistic Regression classifier, Naive-Bayes classifier, and K-Nearest Neighbor classifier achieve accuracy rates of 94 %, 93 %, 92 %, 92 %, and 73 % respectively. Therefore, Random Forest classifier achieves the highest accuracy and performs better than other classifiers. In contrast, K-Nearest Neighbor classifier achieves the lowest accuracy.