Protecting marginalized communities by mitigating discrimination in toxic language detection

Farshid Faal, K. Schmitt, Jia Yuan Yu
{"title":"Protecting marginalized communities by mitigating discrimination in toxic language detection","authors":"Farshid Faal, K. Schmitt, Jia Yuan Yu","doi":"10.1109/istas52410.2021.9629201","DOIUrl":null,"url":null,"abstract":"As the harms of online toxic language become more apparent, countering online toxic behavior is an essential application of natural language processing. The first step in managing toxic language risk is identification, but algorithmic approaches have themselves demonstrated bias. Texts containing some demographic identity terms such as gay or Black are more likely to be labeled as toxic in existing toxic language detection datasets. In many machine learning models introduced for toxic language detection, non-toxic comments containing minority and marginalized community-specific identity terms were given unreasonably high toxicity scores. To address the challenge of bias in toxic language detection, we propose a two-step training approach. A pretrained language model with a multitask learning objective will mitigate biases in the toxicity classifier prediction. Experiments demonstrate that jointly training the pretrained language model with a multitask objective can effectively mitigate the impacts of unintended biases and is more robust to model bias towards commonly-attacked identity groups presented in datasets without significantly hurting the model’s generalizability.","PeriodicalId":314239,"journal":{"name":"2021 IEEE International Symposium on Technology and Society (ISTAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Technology and Society (ISTAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/istas52410.2021.9629201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As the harms of online toxic language become more apparent, countering online toxic behavior is an essential application of natural language processing. The first step in managing toxic language risk is identification, but algorithmic approaches have themselves demonstrated bias. Texts containing some demographic identity terms such as gay or Black are more likely to be labeled as toxic in existing toxic language detection datasets. In many machine learning models introduced for toxic language detection, non-toxic comments containing minority and marginalized community-specific identity terms were given unreasonably high toxicity scores. To address the challenge of bias in toxic language detection, we propose a two-step training approach. A pretrained language model with a multitask learning objective will mitigate biases in the toxicity classifier prediction. Experiments demonstrate that jointly training the pretrained language model with a multitask objective can effectively mitigate the impacts of unintended biases and is more robust to model bias towards commonly-attacked identity groups presented in datasets without significantly hurting the model’s generalizability.
通过减少有毒语言检测中的歧视来保护边缘化社区
随着网络有毒语言的危害越来越明显,打击网络有毒行为是自然语言处理的一个重要应用。管理有毒语言风险的第一步是识别,但算法方法本身已经证明存在偏见。在现有的有毒语言检测数据集中,包含一些人口统计学身份术语(如gay或Black)的文本更有可能被标记为有毒。在许多用于有毒语言检测的机器学习模型中,包含少数民族和边缘化社区特定身份术语的无毒评论被给予了不合理的高毒性评分。为了解决有毒语言检测中的偏见问题,我们提出了一种两步训练方法。具有多任务学习目标的预训练语言模型将减轻毒性分类器预测中的偏差。实验表明,联合训练具有多任务目标的预训练语言模型可以有效地减轻意外偏差的影响,并且在不显著损害模型的泛化性的情况下,对数据集中呈现的常见攻击身份组的模型偏差具有更强的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信