{"title":"Protecting marginalized communities by mitigating discrimination in toxic language detection","authors":"Farshid Faal, K. Schmitt, Jia Yuan Yu","doi":"10.1109/istas52410.2021.9629201","DOIUrl":null,"url":null,"abstract":"As the harms of online toxic language become more apparent, countering online toxic behavior is an essential application of natural language processing. The first step in managing toxic language risk is identification, but algorithmic approaches have themselves demonstrated bias. Texts containing some demographic identity terms such as gay or Black are more likely to be labeled as toxic in existing toxic language detection datasets. In many machine learning models introduced for toxic language detection, non-toxic comments containing minority and marginalized community-specific identity terms were given unreasonably high toxicity scores. To address the challenge of bias in toxic language detection, we propose a two-step training approach. A pretrained language model with a multitask learning objective will mitigate biases in the toxicity classifier prediction. Experiments demonstrate that jointly training the pretrained language model with a multitask objective can effectively mitigate the impacts of unintended biases and is more robust to model bias towards commonly-attacked identity groups presented in datasets without significantly hurting the model’s generalizability.","PeriodicalId":314239,"journal":{"name":"2021 IEEE International Symposium on Technology and Society (ISTAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Technology and Society (ISTAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/istas52410.2021.9629201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As the harms of online toxic language become more apparent, countering online toxic behavior is an essential application of natural language processing. The first step in managing toxic language risk is identification, but algorithmic approaches have themselves demonstrated bias. Texts containing some demographic identity terms such as gay or Black are more likely to be labeled as toxic in existing toxic language detection datasets. In many machine learning models introduced for toxic language detection, non-toxic comments containing minority and marginalized community-specific identity terms were given unreasonably high toxicity scores. To address the challenge of bias in toxic language detection, we propose a two-step training approach. A pretrained language model with a multitask learning objective will mitigate biases in the toxicity classifier prediction. Experiments demonstrate that jointly training the pretrained language model with a multitask objective can effectively mitigate the impacts of unintended biases and is more robust to model bias towards commonly-attacked identity groups presented in datasets without significantly hurting the model’s generalizability.