通过使用变形语言模型来检测社交媒体上的情绪、仇恨言论和攻击性语言,打击网络骚扰

Doorgesh Sookarah, Loovesh S. Ramwodin
{"title":"通过使用变形语言模型来检测社交媒体上的情绪、仇恨言论和攻击性语言,打击网络骚扰","authors":"Doorgesh Sookarah, Loovesh S. Ramwodin","doi":"10.1109/ELECOM54934.2022.9965237","DOIUrl":null,"url":null,"abstract":"In these contemporary times, social media is omnipresent and most people adhere to at least one of these digital platforms. Social entertainment generates an enormous amount of data and this is an unparalleled opportunity for data scientists and linguistic experts. These factors have renewed the interest in Natural Language Processing techniques and as such, there is a continuous increase in the number of publications that deal with the topic of Tweet classification using machine learning models. In this paper, experiments performed by the TweetEval team from the University of Cardiff have been studied and expanded upon. These tasks include emotion detection, offensive language identification and hate speech detection. The decision was made to focus on these specific classification tasks as they directly relate to unsought behaviours such as online harassment. This research endeavour involved building and testing a transformer-based language model which is capable of matching the performance of TweetEval. The aim of this study is therefore to identify common limitations to such models and how these can be circumvented to effectively combat phenomenon such as cyberbullying and online abuse using machine learning. From the results that were obtained, the developed BERT model performed comparatively well to other similar algorithms for all tasks as the obtained results were an F1-Score of 0.51, 0.76 and 0.80 for hate speech, emotion detection and offensive language respectively.","PeriodicalId":302869,"journal":{"name":"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combatting online harassment by using transformer language models for the detection of emotions, hate speech and offensive language on social media\",\"authors\":\"Doorgesh Sookarah, Loovesh S. Ramwodin\",\"doi\":\"10.1109/ELECOM54934.2022.9965237\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In these contemporary times, social media is omnipresent and most people adhere to at least one of these digital platforms. Social entertainment generates an enormous amount of data and this is an unparalleled opportunity for data scientists and linguistic experts. These factors have renewed the interest in Natural Language Processing techniques and as such, there is a continuous increase in the number of publications that deal with the topic of Tweet classification using machine learning models. In this paper, experiments performed by the TweetEval team from the University of Cardiff have been studied and expanded upon. These tasks include emotion detection, offensive language identification and hate speech detection. The decision was made to focus on these specific classification tasks as they directly relate to unsought behaviours such as online harassment. This research endeavour involved building and testing a transformer-based language model which is capable of matching the performance of TweetEval. The aim of this study is therefore to identify common limitations to such models and how these can be circumvented to effectively combat phenomenon such as cyberbullying and online abuse using machine learning. From the results that were obtained, the developed BERT model performed comparatively well to other similar algorithms for all tasks as the obtained results were an F1-Score of 0.51, 0.76 and 0.80 for hate speech, emotion detection and offensive language respectively.\",\"PeriodicalId\":302869,\"journal\":{\"name\":\"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ELECOM54934.2022.9965237\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Emerging Trends in Electrical, Electronic and Communications Engineering (ELECOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ELECOM54934.2022.9965237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在当今时代,社交媒体无处不在,大多数人都至少使用其中一个数字平台。社交娱乐产生了大量的数据,这对数据科学家和语言学专家来说是一个无与伦比的机会。这些因素重新引起了人们对自然语言处理技术的兴趣,因此,使用机器学习模型处理Tweet分类主题的出版物数量不断增加。在本文中,对卡迪夫大学的TweetEval团队进行的实验进行了研究和扩展。这些任务包括情绪检测、攻击性语言识别和仇恨言论检测。我们决定将重点放在这些具体的分类任务上,因为它们与网络骚扰等非主动行为直接相关。这项研究工作包括构建和测试一个基于转换器的语言模型,该模型能够匹配TweetEval的性能。因此,本研究的目的是确定这些模型的共同限制,以及如何利用机器学习绕过这些限制,有效地打击网络欺凌和在线虐待等现象。从得到的结果来看,所开发的BERT模型在所有任务上的表现都优于其他类似算法,在仇恨言论、情绪检测和攻击性语言方面的F1-Score分别为0.51、0.76和0.80。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Combatting online harassment by using transformer language models for the detection of emotions, hate speech and offensive language on social media
In these contemporary times, social media is omnipresent and most people adhere to at least one of these digital platforms. Social entertainment generates an enormous amount of data and this is an unparalleled opportunity for data scientists and linguistic experts. These factors have renewed the interest in Natural Language Processing techniques and as such, there is a continuous increase in the number of publications that deal with the topic of Tweet classification using machine learning models. In this paper, experiments performed by the TweetEval team from the University of Cardiff have been studied and expanded upon. These tasks include emotion detection, offensive language identification and hate speech detection. The decision was made to focus on these specific classification tasks as they directly relate to unsought behaviours such as online harassment. This research endeavour involved building and testing a transformer-based language model which is capable of matching the performance of TweetEval. The aim of this study is therefore to identify common limitations to such models and how these can be circumvented to effectively combat phenomenon such as cyberbullying and online abuse using machine learning. From the results that were obtained, the developed BERT model performed comparatively well to other similar algorithms for all tasks as the obtained results were an F1-Score of 0.51, 0.76 and 0.80 for hate speech, emotion detection and offensive language respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信