使用基于变换器的方法从社交媒体中检测资源受限语言的网络欺凌行为

Natural Language Processing Journal Pub Date : 2024-09-16 DOI:10.1016/j.nlp.2024.100104

Syed Sihab-Us-Sakib , Md. Rashadur Rahman , Md. Shafiul Alam Forhad , Md. Atiq Aziz

{"title":"使用基于变换器的方法从社交媒体中检测资源受限语言的网络欺凌行为","authors":"Syed Sihab-Us-Sakib , Md. Rashadur Rahman , Md. Shafiul Alam Forhad , Md. Atiq Aziz","doi":"10.1016/j.nlp.2024.100104","DOIUrl":null,"url":null,"abstract":"<div><div>The rise of the internet and social media has facilitated diverse interactions among individuals, but it has also led to an increase in cyberbullying—a phenomenon with detrimental effects on mental health, including the potential to induce suicidal thoughts. To combat this issue, we have developed the Cyberbullying Bengali Dataset (CBD), a novel resource containing 2751 manually labeled texts categorized into five classes, including various forms of cyberbullying and non-bullying instances. In our study on cyberbullying detection, we conducted an extensive evaluation of various machine learning and deep learning models. Specifically, we examined Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest (RF) among the traditional machine learning models. For deep learning models, we explored Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). We have also experimented with state-of-the-art transformer architectures, including m-BERT, BanglaBERT, and XLM-RoBERTa. After rigorous experimentation, XLM-RoBERTa emerged as the most effective model, achieving a significant F1-score of 0.83 and an accuracy of 82.61%, outperforming all other models. Our work provides insights into effective cyberbullying detection on platforms like Facebook, YouTube, and Instagram.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"9 ","pages":"Article 100104"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cyberbullying detection of resource constrained language from social media using transformer-based approach\",\"authors\":\"Syed Sihab-Us-Sakib , Md. Rashadur Rahman , Md. Shafiul Alam Forhad , Md. Atiq Aziz\",\"doi\":\"10.1016/j.nlp.2024.100104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rise of the internet and social media has facilitated diverse interactions among individuals, but it has also led to an increase in cyberbullying—a phenomenon with detrimental effects on mental health, including the potential to induce suicidal thoughts. To combat this issue, we have developed the Cyberbullying Bengali Dataset (CBD), a novel resource containing 2751 manually labeled texts categorized into five classes, including various forms of cyberbullying and non-bullying instances. In our study on cyberbullying detection, we conducted an extensive evaluation of various machine learning and deep learning models. Specifically, we examined Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest (RF) among the traditional machine learning models. For deep learning models, we explored Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). We have also experimented with state-of-the-art transformer architectures, including m-BERT, BanglaBERT, and XLM-RoBERTa. After rigorous experimentation, XLM-RoBERTa emerged as the most effective model, achieving a significant F1-score of 0.83 and an accuracy of 82.61%, outperforming all other models. Our work provides insights into effective cyberbullying detection on platforms like Facebook, YouTube, and Instagram.</div></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"9 \",\"pages\":\"Article 100104\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000529\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

互联网和社交媒体的兴起促进了人与人之间的多样化互动，但也导致了网络欺凌现象的增加--这种现象会对心理健康产生有害影响，包括可能诱发自杀念头。为了解决这一问题，我们开发了网络欺凌孟加拉语数据集（CBD），这是一种新颖的资源，包含 2751 个人工标注的文本，分为五类，包括各种形式的网络欺凌和非欺凌实例。在网络欺凌检测研究中，我们对各种机器学习和深度学习模型进行了广泛评估。具体来说，我们研究了传统机器学习模型中的支持向量机（SVM）、多项式奈何贝叶斯（MNB）和随机森林（RF）。在深度学习模型方面，我们探索了门控循环单元（GRU）、卷积神经网络（CNN）、长短期记忆（LSTM）和双向 LSTM（BiLSTM）。我们还试验了最先进的变压器架构，包括 m-BERT、BanglaBERT 和 XLM-RoBERTa。经过严格的实验，XLM-RoBERTa 成为最有效的模型，其 F1 分数高达 0.83，准确率高达 82.61%，优于其他所有模型。我们的工作为在 Facebook、YouTube 和 Instagram 等平台上有效检测网络欺凌提供了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cyberbullying detection of resource constrained language from social media using transformer-based approach

The rise of the internet and social media has facilitated diverse interactions among individuals, but it has also led to an increase in cyberbullying—a phenomenon with detrimental effects on mental health, including the potential to induce suicidal thoughts. To combat this issue, we have developed the Cyberbullying Bengali Dataset (CBD), a novel resource containing 2751 manually labeled texts categorized into five classes, including various forms of cyberbullying and non-bullying instances. In our study on cyberbullying detection, we conducted an extensive evaluation of various machine learning and deep learning models. Specifically, we examined Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest (RF) among the traditional machine learning models. For deep learning models, we explored Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). We have also experimented with state-of-the-art transformer architectures, including m-BERT, BanglaBERT, and XLM-RoBERTa. After rigorous experimentation, XLM-RoBERTa emerged as the most effective model, achieving a significant F1-score of 0.83 and an accuracy of 82.61%, outperforming all other models. Our work provides insights into effective cyberbullying detection on platforms like Facebook, YouTube, and Instagram.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量