Improving Zero-Shot Cross-Lingual Hate Speech Detection with Pseudo-Label Fine-Tuning of Transformer Language Models

Haris Bin Zia, Ignacio Castro, A. Zubiaga, Gareth Tyson
{"title":"Improving Zero-Shot Cross-Lingual Hate Speech Detection with Pseudo-Label Fine-Tuning of Transformer Language Models","authors":"Haris Bin Zia, Ignacio Castro, A. Zubiaga, Gareth Tyson","doi":"10.1609/icwsm.v16i1.19402","DOIUrl":null,"url":null,"abstract":"Hate speech has proliferated on social media platforms in recent years. While this has been the focus of many studies, most works have exclusively focused on a single language, generally English. Low-resourced languages have been neglected due to the dearth of labeled resources. These languages, however, represent an important portion of the data due to the multilingual nature of social media. This work presents a novel zero-shot, cross-lingual transfer learning pipeline based on pseudo-label fine-tuning of Transformer Language Models for automatic hate speech detection. We employ our pipeline on benchmark datasets covering English (source) and 6 different non-English (target) languages written in 3 different scripts. Our pipeline achieves an average improvement of 7.6% (in terms of macro-F1) over previous zero-shot, cross-lingual models. This demonstrates the feasibility of high accuracy automatic hate speech detection for low-resource languages. We release our code and models at https://github.com/harisbinzia/ZeroshotCrosslingualHateSpeech.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v16i1.19402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Hate speech has proliferated on social media platforms in recent years. While this has been the focus of many studies, most works have exclusively focused on a single language, generally English. Low-resourced languages have been neglected due to the dearth of labeled resources. These languages, however, represent an important portion of the data due to the multilingual nature of social media. This work presents a novel zero-shot, cross-lingual transfer learning pipeline based on pseudo-label fine-tuning of Transformer Language Models for automatic hate speech detection. We employ our pipeline on benchmark datasets covering English (source) and 6 different non-English (target) languages written in 3 different scripts. Our pipeline achieves an average improvement of 7.6% (in terms of macro-F1) over previous zero-shot, cross-lingual models. This demonstrates the feasibility of high accuracy automatic hate speech detection for low-resource languages. We release our code and models at https://github.com/harisbinzia/ZeroshotCrosslingualHateSpeech.
基于变形语言模型的伪标签微调改进零采样跨语言仇恨语音检测
近年来,社交媒体平台上的仇恨言论激增。虽然这一直是许多研究的焦点,但大多数作品都只关注一种语言,通常是英语。由于缺乏标记资源,低资源语言一直被忽视。然而,由于社交媒体的多语言特性,这些语言代表了数据的重要组成部分。这项工作提出了一种新的零采样、跨语言迁移学习管道,该管道基于Transformer语言模型的伪标签微调,用于自动仇恨语音检测。我们在基准数据集上使用我们的管道,这些数据集涵盖英语(源)和6种不同的非英语(目标)语言,用3种不同的脚本编写。与之前的零射击、跨语言模型相比,我们的管道实现了7.6%的平均改进(就宏观f1而言)。这证明了对低资源语言进行高精度仇恨语音自动检测的可行性。我们在https://github.com/harisbinzia/ZeroshotCrosslingualHateSpeech上发布代码和模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信