Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)最新文献

筛选
英文 中文
Racist or Sexist Meme? Classifying Memes beyond Hateful 种族歧视还是性别歧视?对表情包进行分类
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 2021-08-01 DOI: 10.18653/v1/2021.woah-1.23
Haris Bin Zia, Ignacio Castro, Gareth Tyson
{"title":"Racist or Sexist Meme? Classifying Memes beyond Hateful","authors":"Haris Bin Zia, Ignacio Castro, Gareth Tyson","doi":"10.18653/v1/2021.woah-1.23","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.23","url":null,"abstract":"Memes are the combinations of text and images that are often humorous in nature. But, that may not always be the case, and certain combinations of texts and images may depict hate, referred to as hateful memes. This work presents a multimodal pipeline that takes both visual and textual features from memes into account to (1) identify the protected category (e.g. race, sex etc.) that has been attacked; and (2) detect the type of attack (e.g. contempt, slurs etc.). Our pipeline uses state-of-the-art pre-trained visual and textual representations, followed by a simple logistic regression classifier. We employ our pipeline on the Hateful Memes Challenge dataset with additional newly created fine-grained labels for protected category and type of attack. Our best model achieves an AUROC of 0.96 for identifying the protected category, and 0.97 for detecting the type of attack. We release our code at https://github.com/harisbinzia/HatefulMemes","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114855837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Large-Scale English Multi-Label Twitter Dataset for Cyberbullying and Online Abuse Detection 用于网络欺凌和网络滥用检测的大规模英语多标签Twitter数据集
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 2021-08-01 DOI: 10.18653/v1/2021.woah-1.16
S. Salawu, Joan A. Lumsden, Yulan He
{"title":"A Large-Scale English Multi-Label Twitter Dataset for Cyberbullying and Online Abuse Detection","authors":"S. Salawu, Joan A. Lumsden, Yulan He","doi":"10.18653/v1/2021.woah-1.16","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.16","url":null,"abstract":"In this paper, we introduce a new English Twitter-based dataset for cyberbullying detection and online abuse. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabilities of various forms of bullying and offensive content, including insult, trolling, profanity, sarcasm, threat, porn and exclusion. We recruited a pool of 17 annotators to perform fine-grained annotation on the dataset with each tweet annotated by three annotators. All our annotators are high school educated and frequent users of social media. Inter-rater agreement for the dataset as measured by Krippendorff’s Alpha is 0.67. Analysis performed on the dataset confirmed common cyberbullying themes reported by other studies and revealed interesting relationships between the classes. The dataset was used to train a number of transformer-based deep learning models returning impressive results.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133473858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Targets and Aspects in Social Media Hate Speech 社交媒体仇恨言论的目标和方面
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.19
A. Shvets, Paula Fortuna, Juan Soler, L. Wanner
{"title":"Targets and Aspects in Social Media Hate Speech","authors":"A. Shvets, Paula Fortuna, Juan Soler, L. Wanner","doi":"10.18653/v1/2021.woah-1.19","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.19","url":null,"abstract":"Mainstream research on hate speech focused so far predominantly on the task of classifying mainly social media posts with respect to predefined typologies of rather coarse-grained hate speech categories. This may be sufficient if the goal is to detect and delete abusive language posts. However, removal is not always possible due to the legislation of a country. Also, there is evidence that hate speech cannot be successfully combated by merely removing hate speech posts; they should be countered by education and counter-narratives. For this purpose, we need to identify (i) who is the target in a given hate speech post, and (ii) what aspects (or characteristics) of the target are attributed to the target in the post. As the first approximation, we propose to adapt a generic state-of-the-art concept extraction model to the hate speech domain. The outcome of the experiments is promising and can serve as inspiration for further work on the task","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"741 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116120724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Jibes & Delights: A Dataset of Targeted Insults and Compliments to Tackle Online Abuse Jibes & Delights:有针对性的侮辱和赞美数据集,以解决在线滥用
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.14
Ravsimar Sodhi, Kartikey Pant, Radhika Mamidi
{"title":"Jibes & Delights: A Dataset of Targeted Insults and Compliments to Tackle Online Abuse","authors":"Ravsimar Sodhi, Kartikey Pant, Radhika Mamidi","doi":"10.18653/v1/2021.woah-1.14","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.14","url":null,"abstract":"Online abuse and offensive language on social media have become widespread problems in today’s digital age. In this paper, we contribute a Reddit-based dataset, consisting of 68,159 insults and 51,102 compliments targeted at individuals instead of targeting a particular community or race. Secondly, we benchmark multiple existing state-of-the-art models for both classification and unsupervised style transfer on the dataset. Finally, we analyse the experimental results and conclude that the transfer task is challenging, requiring the models to understand the high degree of creativity exhibited in the data.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129476383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes 多模态还是文本?检索还是BERT?仇恨表情包共享任务的基准分类器
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.24
Vasiliki Kougia, John Pavlopoulos
{"title":"Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes","authors":"Vasiliki Kougia, John Pavlopoulos","doi":"10.18653/v1/2021.woah-1.24","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.24","url":null,"abstract":"The Shared Task on Hateful Memes is a challenge that aims at the detection of hateful content in memes by inviting the implementation of systems that understand memes, potentially by combining image and textual information. The challenge consists of three detection tasks: hate, protected category and attack type. The first is a binary classification task, while the other two are multi-label classification tasks. Our participation included a text-based BERT baseline (TxtBERT), the same but adding information from the image (ImgBERT), and neural retrieval approaches. We also experimented with retrieval augmented classification models. We found that an ensemble of TxtBERT and ImgBERT achieves the best performance in terms of ROC AUC score in two out of the three tasks on our development set.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"51 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114116853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers 利用双向变压器辅助数据进行攻击性语言检测
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.1
Sumer Singh, Sheng Li
{"title":"Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers","authors":"Sumer Singh, Sheng Li","doi":"10.18653/v1/2021.woah-1.1","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.1","url":null,"abstract":"Offensive language detection (OLD) has received increasing attention due to its societal impact. Recent work shows that bidirectional transformer based methods obtain impressive performance on OLD. However, such methods usually rely on large-scale well-labeled OLD datasets for model training. To address the issue of data/label scarcity in OLD, in this paper, we propose a simple yet effective domain adaptation approach to train bidirectional transformers. Our approach introduces domain adaptation (DA) training procedures to ALBERT, such that it can effectively exploit auxiliary data from source domains to improve the OLD performance in a target domain. Experimental results on benchmark datasets show that our approach, ALBERT (DA), obtains the state-of-the-art performance in most cases. Particularly, our approach significantly benefits underrepresented and under-performing classes, with a significant improvement over ALBERT.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128833485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format 有毒评论收集:使30多个数据集在一个统一的格式中易于访问
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.17
Julian Risch, Philipp Schmidt, Ralf Krestel
{"title":"Toxic Comment Collection: Making More Than 30 Datasets Easily Accessible in One Unified Format","authors":"Julian Risch, Philipp Schmidt, Ralf Krestel","doi":"10.18653/v1/2021.woah-1.17","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.17","url":null,"abstract":"With the rise of research on toxic comment classification, more and more annotated datasets have been released. The wide variety of the task (different languages, different labeling processes and schemes) has led to a large amount of heterogeneous datasets that can be used for training and testing very specific settings. Despite recent efforts to create web pages that provide an overview, most publications still use only a single dataset. They are not stored in one central database, they come in many different data formats and it is difficult to interpret their class labels and how to reuse these labels in other projects. To overcome these issues, we present a collection of more than thirty datasets in the form of a software tool that automatizes downloading and processing of the data and presents them in a unified data format that also offers a mapping of compatible class labels. Another advantage of that tool is that it gives an overview of properties of available datasets, such as different languages, platforms, and class labels to make it easier to select suitable training and test data.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131195627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection WOAH 5共享任务在细粒度仇恨模因检测中的发现
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.21
Lambert Mathias, Shaoliang Nie, Aida Mostafazadeh Davani, Douwe Kiela, Vinodkumar Prabhakaran, Bertie Vidgen, Zeerak Talat
{"title":"Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection","authors":"Lambert Mathias, Shaoliang Nie, Aida Mostafazadeh Davani, Douwe Kiela, Vinodkumar Prabhakaran, Bertie Vidgen, Zeerak Talat","doi":"10.18653/v1/2021.woah-1.21","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.21","url":null,"abstract":"We present the results and main findings of the shared task at WOAH 5 on hateful memes detection. The task include two subtasks relating to distinct challenges in the fine-grained detection of hateful memes: (1) the protected category attacked by the meme and (2) the attack type. 3 teams submitted system description papers. This shared task builds on the hateful memes detection task created by Facebook AI Research in 2020.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124897610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fine-Grained Fairness Analysis of Abusive Language Detection Systems with CheckList 基于检查表的滥用语言检测系统的细粒度公平性分析
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.9
Marta Marchiori Manerba, Sara Tonelli
{"title":"Fine-Grained Fairness Analysis of Abusive Language Detection Systems with CheckList","authors":"Marta Marchiori Manerba, Sara Tonelli","doi":"10.18653/v1/2021.woah-1.9","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.9","url":null,"abstract":"Current abusive language detection systems have demonstrated unintended bias towards sensitive features such as nationality or gender. This is a crucial issue, which may harm minorities and underrepresented groups if such systems were integrated in real-world applications. In this paper, we create ad hoc tests through the CheckList tool (Ribeiro et al., 2020) to detect biases within abusive language classifiers for English. We compare the behaviour of two BERT-based models, one trained on a generic hate speech dataset and the other on a dataset for misogyny detection. Our evaluation shows that, although BERT-based classifiers achieve high accuracy levels on a variety of natural language processing tasks, they perform very poorly as regards fairness and bias, in particular on samples involving implicit stereotypes, expressions of hate towards minorities and protected attributes such as race or sexual orientation. We release both the notebooks implemented to extend the Fairness tests and the synthetic datasets usable to evaluate systems bias independently of CheckList.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"64 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116557884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Abusive Language on Social Media Through the Legal Looking Glass 透过法律的镜子看社交媒体上的辱骂语言
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021) Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.woah-1.20
Thales Bertaglia, A. Grigoriu, M. Dumontier, Gijs van Dijck
{"title":"Abusive Language on Social Media Through the Legal Looking Glass","authors":"Thales Bertaglia, A. Grigoriu, M. Dumontier, Gijs van Dijck","doi":"10.18653/v1/2021.woah-1.20","DOIUrl":"https://doi.org/10.18653/v1/2021.woah-1.20","url":null,"abstract":"Abusive language is a growing phenomenon on social media platforms. Its effects can reach beyond the online context, contributing to mental or emotional stress on users. Automatic tools for detecting abuse can alleviate the issue. In practice, developing automated methods to detect abusive language relies on good quality data. However, there is currently a lack of standards for creating datasets in the field. These standards include definitions of what is considered abusive language, annotation guidelines and reporting on the process. This paper introduces an annotation framework inspired by legal concepts to define abusive language in the context of online harassment. The framework uses a 7-point Likert scale for labelling instead of class labels. We also present ALYT – a dataset of Abusive Language on YouTube. ALYT includes YouTube comments in English extracted from videos on different controversial topics and labelled by Law students. The comments were sampled from the actual collected data, without artificial methods for increasing the abusive content. The paper describes the annotation process thoroughly, including all its guidelines and training steps.","PeriodicalId":166161,"journal":{"name":"Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115033391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信