Cultural Differences in Bias? Origin and Gender Bias in Pre-Trained German and French Word Embeddings

IF 0.5 4区 社会学 0 HUMANITIES, MULTIDISCIPLINARY
Mascha Kurpicz-Briki
{"title":"Cultural Differences in Bias? Origin and Gender Bias in Pre-Trained German and French Word Embeddings","authors":"Mascha Kurpicz-Briki","doi":"10.24451/ARBOR.11922","DOIUrl":null,"url":null,"abstract":"Smart applications often rely on training data in form of text. If there is a bias in that training data, the decision of the applications might not be fair. Common training data has been shown to be biased towards different groups of minorities. However, there is no generic algorithm to determine the fairness of training data. One existing approach is to measure gender bias using word embeddings. Most research in this field has been dedicated to the English language. In this work, we identified that there is a bias towards gender and origin in both German and French word embeddings. In particular, we found that real-world bias and stereotypes from the 18th century are still included in today’s word embeddings. Furthermore, we show that the gender bias in German has a different form from English and there is indication that bias has cultural differences that need to be considered when analyzing texts and word embeddings in different languages.","PeriodicalId":45891,"journal":{"name":"ARBOR-CIENCIA PENSAMIENTO Y CULTURA","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2020-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ARBOR-CIENCIA PENSAMIENTO Y CULTURA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24451/ARBOR.11922","RegionNum":4,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 16

Abstract

Smart applications often rely on training data in form of text. If there is a bias in that training data, the decision of the applications might not be fair. Common training data has been shown to be biased towards different groups of minorities. However, there is no generic algorithm to determine the fairness of training data. One existing approach is to measure gender bias using word embeddings. Most research in this field has been dedicated to the English language. In this work, we identified that there is a bias towards gender and origin in both German and French word embeddings. In particular, we found that real-world bias and stereotypes from the 18th century are still included in today’s word embeddings. Furthermore, we show that the gender bias in German has a different form from English and there is indication that bias has cultural differences that need to be considered when analyzing texts and word embeddings in different languages.
偏见的文化差异?预训练德语和法语词嵌入的起源和性别偏见
智能应用程序通常依赖于文本形式的训练数据。如果训练数据中存在偏见,那么申请的决定可能就不公平。常见的培训数据已被证明偏向于不同的少数群体。然而,目前还没有通用的算法来确定训练数据的公平性。一种现有的方法是使用词嵌入来衡量性别偏见。这一领域的大多数研究都是针对英语语言的。在这项工作中,我们发现在德语和法语单词嵌入中都存在对性别和起源的偏见。特别是,我们发现来自18世纪的现实世界偏见和刻板印象仍然包含在今天的单词嵌入中。此外,我们还表明,德语中的性别偏见与英语的形式不同,并且有迹象表明,在分析不同语言的文本和词嵌入时,偏见具有文化差异,需要考虑到这一点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ARBOR-CIENCIA PENSAMIENTO Y CULTURA
ARBOR-CIENCIA PENSAMIENTO Y CULTURA HUMANITIES, MULTIDISCIPLINARY-
CiteScore
0.60
自引率
0.00%
发文量
21
审稿时长
48 weeks
期刊介绍: Arbor is a bimonthly Journal publishing original articles on Science, Thought and Culture. By examining different topics with a rigorous scientific approach, Arbor intends to service the Spanish society and scientific community by providing information, updating, reflection and debate on subjects of current interest. Arbor is among the oldest Journals published by CSIC, and is open to researchers and Culture creators and managers, both Spanish and foreign.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信