Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets From Twitter

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Affective Computing Pub Date : 2024-12-16 DOI:10.1109/TAFFC.2024.3518587

Paula Ferreira;Nádia Pereira;Hugo Rosa;Sofia Oliveira;Luísa Coheur;Sofia Francisco;Sidclay Souza;Ricardo Ribeiro;João P. Carvalho;Paula Paulino;Isabel Trancoso;Ana Margarida Veiga-Simão

{"title":"Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets From Twitter","authors":"Paula Ferreira;Nádia Pereira;Hugo Rosa;Sofia Oliveira;Luísa Coheur;Sofia Francisco;Sidclay Souza;Ricardo Ribeiro;João P. Carvalho;Paula Paulino;Isabel Trancoso;Ana Margarida Veiga-Simão","doi":"10.1109/TAFFC.2024.3518587","DOIUrl":null,"url":null,"abstract":"Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Finally, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1473-1487"},"PeriodicalIF":9.8000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10804011/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Finally, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.

查看原文本刊更多论文

实现网络欺凌检测：对 Twitter 中的攻击性和冲突/攻击数据集进行构建、标杆分析和纵向分析

冒犯和仇恨言论是社交媒体上常见的网络冲突的来源，因此，它们的研究是机器学习和自然语言处理领域一个日益增长的研究主题。本文介绍了两个与葡萄牙语攻击相关的数据集，加深了对这一主题的研究：攻击性数据集和冲突/攻击数据集。前者与其他犯罪检测相关数据集相似，而后者由于使用了用户之间交互的历史而构成了新颖性。进行了几项研究来构建和分析数据集中的数据。第一项研究包括收集青少年目击的言语攻击表达，以指导数据集的数据提取。第二项研究包括从Twitter（葡萄牙语）中提取与前一项研究中识别的最频繁的表达/单词/句子相匹配的数据。第三项研究包括侵略性数据集、冲突/攻击数据集和分类模型的开发。在我们的第四项研究中，我们以86名青少年为样本，研究了网络攻击和冲突/攻击是否随着时间的推移显示出任何趋势变化。在这项研究中，我们还提出调查273天内发送的推文数量是否与在线侵略和冲突/攻击有关。最后，我们分析了参与侵略和/或攻击/冲突的参与者的百分比。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.