Incorporating Subjectivity into Gendered Ambiguous Pronoun (GAP) Resolution using Style Transfer

Kartikey Pant, Tanvi Dadu
{"title":"Incorporating Subjectivity into Gendered Ambiguous Pronoun (GAP) Resolution using Style Transfer","authors":"Kartikey Pant, Tanvi Dadu","doi":"10.18653/v1/2022.gebnlp-1.28","DOIUrl":null,"url":null,"abstract":"The GAP dataset is a Wikipedia-based evaluation dataset for gender bias detection in coreference resolution, containing mostly objective sentences. Since subjectivity is ubiquitous in our daily texts, it becomes necessary to evaluate models for both subjective and objective instances. In this work, we present a new evaluation dataset for gender bias in coreference resolution, GAP-Subjective, which increases the coverage of the original GAP dataset by including subjective sentences. We outline the methodology used to create this dataset. Firstly, we detect objective sentences and transfer them into their subjective variants using a sequence-to-sequence model. Secondly, we outline the thresholding techniques based on fluency and content preservation to maintain the quality of the sentences. Thirdly, we perform automated and human-based analysis of the style transfer and infer that the transferred sentences are of high quality. Finally, we benchmark both GAP and GAP-Subjective datasets using a BERT-based model and analyze its predictive performance and gender bias.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.gebnlp-1.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The GAP dataset is a Wikipedia-based evaluation dataset for gender bias detection in coreference resolution, containing mostly objective sentences. Since subjectivity is ubiquitous in our daily texts, it becomes necessary to evaluate models for both subjective and objective instances. In this work, we present a new evaluation dataset for gender bias in coreference resolution, GAP-Subjective, which increases the coverage of the original GAP dataset by including subjective sentences. We outline the methodology used to create this dataset. Firstly, we detect objective sentences and transfer them into their subjective variants using a sequence-to-sequence model. Secondly, we outline the thresholding techniques based on fluency and content preservation to maintain the quality of the sentences. Thirdly, we perform automated and human-based analysis of the style transfer and infer that the transferred sentences are of high quality. Finally, we benchmark both GAP and GAP-Subjective datasets using a BERT-based model and analyze its predictive performance and gender bias.
运用风格迁移将主体性融入性别歧义代词消解
GAP数据集是一个基于维基百科的评估数据集,用于在共同参考分辨率中检测性别偏见,包含大多数客观句子。由于主观性在我们的日常文本中无处不在,因此有必要对主观和客观实例的模型进行评估。在这项工作中,我们提出了一个新的评估数据集,GAP- subjective,它通过包含主观句子来增加原始GAP数据集的覆盖率。我们概述了用于创建此数据集的方法。首先,我们使用序列到序列模型检测客观句子并将其转换为主观变体。其次,我们概述了基于流畅性和内容保留的阈值技术,以保持句子的质量。第三,我们对风格迁移进行了自动化和人工分析,并推断出迁移的句子是高质量的。最后,我们使用基于bert的模型对GAP和GAP- subjective数据集进行基准测试,并分析其预测性能和性别偏见。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信