Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)最新文献

筛选
英文 中文
Indigenous Language Revitalization and the Dilemma of Gender Bias 本土语言复兴与性别偏见困境
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.25
Oussama Hansal, N. Le, F. Sadat
{"title":"Indigenous Language Revitalization and the Dilemma of Gender Bias","authors":"Oussama Hansal, N. Le, F. Sadat","doi":"10.18653/v1/2022.gebnlp-1.25","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.25","url":null,"abstract":"Natural Language Processing (NLP), through its several applications, has been considered as one of the most valuable field in interdisciplinary researches, as well as in computer science. However, it is not without its flaws. One of the most common flaws is bias. This paper examines the main linguistic challenges of Inuktitut, an indigenous language of Canada, and focuses on gender bias identification and mitigation. We explore the unique characteristics of this language to help us understand the right techniques that can be used to identify and mitigate implicit biases. We use some methods to quantify the gender bias existing in Inuktitut word embeddings; then we proceed to mitigate the bias and evaluate the performance of the debiased embeddings. Next, we explain how approaches for detecting and reducing bias in English embeddings may be transferred to Inuktitut embeddings by properly taking into account the language’s particular characteristics. Next, we compare the effect of the debiasing techniques on Inuktitut and English. Finally, we highlight some future research directions which will further help to push the boundaries.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123987999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HeteroCorpus: A Corpus for Heteronormative Language Detection 异质语料库:异质规范语言检测的语料库
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.23
Juan Vásquez, G. Bel-Enguix, Scott Andersen, Sergio-Luis Ojeda-Trueba
{"title":"HeteroCorpus: A Corpus for Heteronormative Language Detection","authors":"Juan Vásquez, G. Bel-Enguix, Scott Andersen, Sergio-Luis Ojeda-Trueba","doi":"10.18653/v1/2022.gebnlp-1.23","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.23","url":null,"abstract":"In recent years, plenty of work has been done by the NLP community regarding gender bias detection and mitigation in language systems. Yet, to our knowledge, no one has focused on the difficult task of heteronormative language detection and mitigation. We consider this an urgent issue, since language technologies are growing increasingly present in the world and, as it has been proven by various studies, NLP systems with biases can create real-life adverse consequences for women, gender minorities and racial minorities and queer people. For these reasons, we propose and evaluate HeteroCorpus; a corpus created specifically for studying heterononormative language in English. Additionally, we propose a baseline set of classification experiments on our corpus, in order to show the performance of our corpus in classification tasks.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125761518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Taxonomy of Bias-Causing Ambiguities in Machine Translation 机器翻译中引起偏差的歧义分类
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.18
M. Mechura
{"title":"A Taxonomy of Bias-Causing Ambiguities in Machine Translation","authors":"M. Mechura","doi":"10.18653/v1/2022.gebnlp-1.18","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.18","url":null,"abstract":"This paper introduces a taxonomy of phenomena which cause bias in machine translation, covering gender bias (people being male and/or female), number bias (singular you versus plural you) and formality bias (informal you versus formal you). Our taxonomy is a formalism for describing situations in machine translation when the source text leaves some of these properties unspecified (eg. does not say whether doctor is male or female) but the target language requires the property to be specified (eg. because it does not have a gender-neutral word for doctor). The formalism described here is used internally by a web-based tool we have built for detecting and correcting bias in the output of any machine translator.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132411918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text 性别偏见注释中的不确定性和包容性:英国英语文本注释分类和注释数据集
Lucy Havens, B. Alex, Benjamin Bach, Melissa Mhairi Terras
{"title":"Uncertainty and Inclusivity in Gender Bias Annotation: An Annotation Taxonomy and Annotated Datasets of British English Text","authors":"Lucy Havens, B. Alex, Benjamin Bach, Melissa Mhairi Terras","doi":"10.18653/v1/2022.gebnlp-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.4","url":null,"abstract":"Mitigating harms from gender biased language in Natural Language Processing (NLP) systems remains a challenge, and the situated nature of language means bias is inescapable in NLP data. Though efforts to mitigate gender bias in NLP are numerous, they often vaguely define gender and bias, only consider two genders, and do not incorporate uncertainty into models. To address these limitations, in this paper we present a taxonomy of gender biased language and apply it to create annotated datasets. We created the taxonomy and annotated data with the aim of making gender bias in language transparent. If biases are communicated clearly, varieties of biased language can be better identified and measured. Our taxonomy contains eleven types of gender biases inclusive of people whose gender expressions do not fit into the binary conceptions of woman and man, and whose gender differs from that they were assigned at birth, while also allowing annotators to document unknown gender information. The taxonomy and annotated data will, in future work, underpin analysis and more equitable language model development.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129222082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT 从蒸馏酒的角度看知识蒸馏为何会放大性别偏见及如何缓解
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.27
Jaimeen Ahn, Hwaran Lee, Jinhwa Kim, Alice Oh
{"title":"Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT","authors":"Jaimeen Ahn, Hwaran Lee, Jinhwa Kim, Alice Oh","doi":"10.18653/v1/2022.gebnlp-1.27","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.27","url":null,"abstract":"Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model.However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model.This paper studies what causes gender bias to increase after the knowledge distillation process.Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation.By doing so, we can significantly reduce the gender bias amplification after knowledge distillation.We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125463569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Incorporating Subjectivity into Gendered Ambiguous Pronoun (GAP) Resolution using Style Transfer 运用风格迁移将主体性融入性别歧义代词消解
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.28
Kartikey Pant, Tanvi Dadu
{"title":"Incorporating Subjectivity into Gendered Ambiguous Pronoun (GAP) Resolution using Style Transfer","authors":"Kartikey Pant, Tanvi Dadu","doi":"10.18653/v1/2022.gebnlp-1.28","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.28","url":null,"abstract":"The GAP dataset is a Wikipedia-based evaluation dataset for gender bias detection in coreference resolution, containing mostly objective sentences. Since subjectivity is ubiquitous in our daily texts, it becomes necessary to evaluate models for both subjective and objective instances. In this work, we present a new evaluation dataset for gender bias in coreference resolution, GAP-Subjective, which increases the coverage of the original GAP dataset by including subjective sentences. We outline the methodology used to create this dataset. Firstly, we detect objective sentences and transfer them into their subjective variants using a sequence-to-sequence model. Secondly, we outline the thresholding techniques based on fluency and content preservation to maintain the quality of the sentences. Thirdly, we perform automated and human-based analysis of the style transfer and infer that the transferred sentences are of high quality. Finally, we benchmark both GAP and GAP-Subjective datasets using a BERT-based model and analyze its predictive performance and gender bias.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125489156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of Gender Bias in Social Perception and Judgement Using Chinese Word Embeddings 基于中文词嵌入的社会感知与判断中的性别偏见分析
Jiali Li, Shucheng Zhu, Ying Liu, Pengyuan Liu
{"title":"Analysis of Gender Bias in Social Perception and Judgement Using Chinese Word Embeddings","authors":"Jiali Li, Shucheng Zhu, Ying Liu, Pengyuan Liu","doi":"10.18653/v1/2022.gebnlp-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.2","url":null,"abstract":"Gender is a construction in line with social perception and judgment. An important means of this construction is through languages. When natural language processing tools, such as word embeddings, associate gender with the relevant categories of social perception and judgment, it is likely to cause bias and harm to those groups that do not conform to the mainstream social perception and judgment. Using 12,251 Chinese word embeddings as intermedium, this paper studies the relationship between social perception and judgment categories and gender. The results reveal that these grammatical gender-neutral Chinese word embeddings show a certain gender bias, which is consistent with the mainstream society’s perception and judgment of gender. Men are judged by their actions and perceived as bad, easily-disgusted, bad-tempered and rational roles while women are judged by their appearances and perceived as perfect, either happy or sad, and emotional roles.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evaluating Gender Bias Transfer from Film Data 从电影数据评估性别偏见转移
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.24
Amanda Bertsch, Ashley Oh, Sanika Natu, Swetha Gangu, A. Black, Emma Strubell
{"title":"Evaluating Gender Bias Transfer from Film Data","authors":"Amanda Bertsch, Ashley Oh, Sanika Natu, Swetha Gangu, A. Black, Emma Strubell","doi":"10.18653/v1/2022.gebnlp-1.24","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.24","url":null,"abstract":"Films are a rich source of data for natural language processing. OpenSubtitles (Lison and Tiedemann, 2016) is a popular movie script dataset, used for training models for tasks such as machine translation and dialogue generation. However, movies often contain biases that reflect society at the time, and these biases may be introduced during pre-training and influence downstream models. We perform sentiment analysis on template infilling (Kurita et al., 2019) and the Sentence Embedding Association Test (May et al., 2019) to measure how BERT-based language models change after continued pre-training on OpenSubtitles. We consider gender bias as a primary motivating case for this analysis, while also measuring other social biases such as disability. We show that sentiment analysis on template infilling is not an effective measure of bias due to the rarity of disability and gender identifying tokens in the movie dialogue. We extend our analysis to a longitudinal study of bias in film dialogue over the last 110 years and find that continued pre-training on OpenSubtitles encodes additional bias into BERT. We show that BERT learns associations that reflect the biases and representation of each film era, suggesting that additional care must be taken when using historical data.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116715689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Gender Biases in Offensive Language Classification Models 论攻击性语言分类模型中的性别偏见
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.19
Sanjana Marcé, Adam Poliak
{"title":"On Gender Biases in Offensive Language Classification Models","authors":"Sanjana Marcé, Adam Poliak","doi":"10.18653/v1/2022.gebnlp-1.19","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.19","url":null,"abstract":"We explore whether neural Natural Language Processing models trained to identify offensive language in tweets contain gender biases. We add historically gendered and gender ambiguous American names to an existing offensive language evaluation set to determine whether models? predictions are sensitive or robust to gendered names. While we see some evidence that these models might be prone to biased stereotypes that men use more offensive language than women, our results indicate that these models? binary predictions might not greatly change based upon gendered names.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"531 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123457894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Empirical Study on the Fairness of Pre-trained Word Embeddings 预训练词嵌入公平性的实证研究
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP) Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.gebnlp-1.15
E. Sesari, Max Hort, Federica Sarro
{"title":"An Empirical Study on the Fairness of Pre-trained Word Embeddings","authors":"E. Sesari, Max Hort, Federica Sarro","doi":"10.18653/v1/2022.gebnlp-1.15","DOIUrl":"https://doi.org/10.18653/v1/2022.gebnlp-1.15","url":null,"abstract":"Pre-trained word embedding models are easily distributed and applied, as they alleviate users from the effort to train models themselves. With widely distributed models, it is important to ensure that they do not exhibit undesired behaviour, such as biases against population groups. For this purpose, we carry out an empirical study on evaluating the bias of 15 publicly available, pre-trained word embeddings model based on three training algorithms (GloVe, word2vec, and fastText) with regard to four bias metrics (WEAT, SEMBIAS,DIRECT BIAS, and ECT). The choice of word embedding models and bias metrics is motivated by a literature survey over 37 publications which quantified bias on pre-trained word embeddings. Our results indicate that fastText is the least biased model (in 8 out of 12 cases) and small vector lengths lead to a higher bias.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122616623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信