Utilizing subjectivity level to mitigate identity term bias in toxic comments classification

Q1 Social Sciences
Zhixue Zhao, Ziqi Zhang, Frank Hopfgartner
{"title":"Utilizing subjectivity level to mitigate identity term bias in toxic comments classification","authors":"Zhixue Zhao,&nbsp;Ziqi Zhang,&nbsp;Frank Hopfgartner","doi":"10.1016/j.osnem.2022.100205","DOIUrl":null,"url":null,"abstract":"<div><p><span><span><span>Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in </span>false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on </span>BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different </span>social media platforms<span>. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).</span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S246869642200009X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 1

Abstract

Toxic comment classification models are often found biased towards identity terms, i.e., terms characterizing a specific group of people such as “Muslim” and “black”. Such bias is commonly reflected in false positive predictions, i.e., non-toxic comments with identity terms. In this work, we propose a novel approach to debias the model in toxic comment classification, leveraging the notion of subjectivity level of a comment and the presence of identity terms. We hypothesize that toxic comments containing identity terms are more likely to be expressions of subjective feelings or opinions. Therefore, the subjectivity level of a comment containing identity terms can be helpful for classifying toxic comments and mitigating the identity term bias. To implement this idea, we propose a model based on BERT and study two different methods of measuring the subjectivity level. The first method uses a lexicon-based tool. The second method is based on the idea of calculating the embedding similarity between a comment and a relevant Wikipedia text of the identity term in the comment. We thoroughly evaluate our method on an extensive collection of four datasets collected from different social media platforms. Our results show that: (1) our models that incorporate both features of subjectivity and identity terms consistently outperform strong SOTA baselines, with our best performing model achieving an improvement in F1 of 4.75% over a Twitter dataset; (2) our idea of measuring subjectivity based on the similarity to the relevant Wikipedia text is very effective on toxic comment classification as our model using this has achieved the best performance on 3 out of 4 datasets while obtaining comparative performance on the remaining dataset. We further test our method on RoBERTa to evaluate the generality of our method and the results show the biggest improvement in F1 of up to 1.29% (on a dataset from a white supremacist online forum).

利用主观性水平减轻有毒评论分类中的身份词偏差
有毒评论分类模型经常被发现偏向于身份术语,即描述特定人群的术语,如“穆斯林”和“黑人”。这种偏见通常反映在假阳性预测中,即带有身份术语的无毒评论。在这项工作中,我们提出了一种新的方法来消除有毒评论分类模型的偏见,利用评论的主观性水平和身份术语的存在的概念。我们假设含有身份术语的有毒评论更有可能是主观感受或观点的表达。因此,包含身份术语的评论的主观性水平有助于对有毒评论进行分类,减轻身份术语偏见。为了实现这一思想,我们提出了一个基于BERT的模型,并研究了两种不同的主观水平测量方法。第一种方法使用基于词典的工具。第二种方法是基于计算评论和评论中标识词的相关维基百科文本之间的嵌入相似度的思想。我们在从不同的社交媒体平台收集的四个数据集的广泛收集上彻底评估了我们的方法。我们的研究结果表明:(1)我们的模型结合了主观性和身份术语的特征,始终优于强大的SOTA基线,与Twitter数据集相比,我们表现最好的模型的F1提高了4.75%;(2)我们基于与相关维基百科文本的相似度来衡量主观性的想法对有毒评论分类非常有效,因为我们使用的模型在4个数据集中的3个数据集上取得了最佳性能,同时在其余数据集上获得了比较性能。我们进一步在RoBERTa上测试了我们的方法,以评估我们方法的一般性,结果显示F1的最大改进高达1.29%(来自白人至上主义者在线论坛的数据集)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Online Social Networks and Media
Online Social Networks and Media Social Sciences-Communication
CiteScore
10.60
自引率
0.00%
发文量
32
审稿时长
44 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信