通过使用变形语言模型来检测社交媒体上的情绪、仇恨言论和攻击性语言，打击网络骚扰

2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM) Pub Date : 2022-11-22 DOI:10.1109/ELECOM54934.2022.9965237

Doorgesh Sookarah, Loovesh S. Ramwodin

{"title":"通过使用变形语言模型来检测社交媒体上的情绪、仇恨言论和攻击性语言，打击网络骚扰","authors":"Doorgesh Sookarah, Loovesh S. Ramwodin","doi":"10.1109/ELECOM54934.2022.9965237","DOIUrl":null,"url":null,"abstract":"In these contemporary times, social media is omnipresent and most people adhere to at least one of these digital platforms. Social entertainment generates an enormous amount of data and this is an unparalleled opportunity for data scientists and linguistic experts. These factors have renewed the interest in Natural Language Processing techniques and as such, there is a continuous increase in the number of publications that deal with the topic of Tweet classification using machine learning models. In this paper, experiments performed by the TweetEval team from the University of Cardiff have been studied and expanded upon. These tasks include emotion detection, offensive language identification and hate speech detection. The decision was made to focus on these specific classification tasks as they directly relate to unsought behaviours such as online harassment. This research endeavour involved building and testing a transformer-based language model which is capable of matching the performance of TweetEval. The aim of this study is therefore to identify common limitations to such models and how these can be circumvented to effectively combat phenomenon such as cyberbullying and online abuse using machine learning. From the results that were obtained, the developed BERT model performed comparatively well to other similar algorithms for all tasks as the obtained results were an F1-Score of 0.51, 0.76 and 0.80 for hate speech, emotion detection and offensive language respectively.","PeriodicalId":302869,"journal":{"name":"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combatting online harassment by using transformer language models for the detection of emotions, hate speech and offensive language on social media\",\"authors\":\"Doorgesh Sookarah, Loovesh S. Ramwodin\",\"doi\":\"10.1109/ELECOM54934.2022.9965237\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In these contemporary times, social media is omnipresent and most people adhere to at least one of these digital platforms. Social entertainment generates an enormous amount of data and this is an unparalleled opportunity for data scientists and linguistic experts. These factors have renewed the interest in Natural Language Processing techniques and as such, there is a continuous increase in the number of publications that deal with the topic of Tweet classification using machine learning models. In this paper, experiments performed by the TweetEval team from the University of Cardiff have been studied and expanded upon. These tasks include emotion detection, offensive language identification and hate speech detection. The decision was made to focus on these specific classification tasks as they directly relate to unsought behaviours such as online harassment. This research endeavour involved building and testing a transformer-based language model which is capable of matching the performance of TweetEval. The aim of this study is therefore to identify common limitations to such models and how these can be circumvented to effectively combat phenomenon such as cyberbullying and online abuse using machine learning. From the results that were obtained, the developed BERT model performed comparatively well to other similar algorithms for all tasks as the obtained results were an F1-Score of 0.51, 0.76 and 0.80 for hate speech, emotion detection and offensive language respectively.\",\"PeriodicalId\":302869,\"journal\":{\"name\":\"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ELECOM54934.2022.9965237\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ELECOM54934.2022.9965237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在当今时代，社交媒体无处不在，大多数人都至少使用其中一个数字平台。社交娱乐产生了大量的数据，这对数据科学家和语言学专家来说是一个无与伦比的机会。这些因素重新引起了人们对自然语言处理技术的兴趣，因此，使用机器学习模型处理Tweet分类主题的出版物数量不断增加。在本文中，对卡迪夫大学的TweetEval团队进行的实验进行了研究和扩展。这些任务包括情绪检测、攻击性语言识别和仇恨言论检测。我们决定将重点放在这些具体的分类任务上，因为它们与网络骚扰等非主动行为直接相关。这项研究工作包括构建和测试一个基于转换器的语言模型，该模型能够匹配TweetEval的性能。因此，本研究的目的是确定这些模型的共同限制，以及如何利用机器学习绕过这些限制，有效地打击网络欺凌和在线虐待等现象。从得到的结果来看，所开发的BERT模型在所有任务上的表现都优于其他类似算法，在仇恨言论、情绪检测和攻击性语言方面的F1-Score分别为0.51、0.76和0.80。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Combatting online harassment by using transformer language models for the detection of emotions, hate speech and offensive language on social media

In these contemporary times, social media is omnipresent and most people adhere to at least one of these digital platforms. Social entertainment generates an enormous amount of data and this is an unparalleled opportunity for data scientists and linguistic experts. These factors have renewed the interest in Natural Language Processing techniques and as such, there is a continuous increase in the number of publications that deal with the topic of Tweet classification using machine learning models. In this paper, experiments performed by the TweetEval team from the University of Cardiff have been studied and expanded upon. These tasks include emotion detection, offensive language identification and hate speech detection. The decision was made to focus on these specific classification tasks as they directly relate to unsought behaviours such as online harassment. This research endeavour involved building and testing a transformer-based language model which is capable of matching the performance of TweetEval. The aim of this study is therefore to identify common limitations to such models and how these can be circumvented to effectively combat phenomenon such as cyberbullying and online abuse using machine learning. From the results that were obtained, the developed BERT model performed comparatively well to other similar algorithms for all tasks as the obtained results were an F1-Score of 0.51, 0.76 and 0.80 for hate speech, emotion detection and offensive language respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)

自引率

0.00%

发文量