从词嵌入看Visegrád群体语言的性别不对称

R. Garabík, Jana Wachtarczyková
{"title":"从词嵌入看Visegrád群体语言的性别不对称","authors":"R. Garabík, Jana Wachtarczyková","doi":"10.2478/jazcas-2023-0013","DOIUrl":null,"url":null,"abstract":"Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gender Asymmetry of Visegrád Group Languages as Reflected by Word Embeddings\",\"authors\":\"R. Garabík, Jana Wachtarczyková\",\"doi\":\"10.2478/jazcas-2023-0013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.\",\"PeriodicalId\":262732,\"journal\":{\"name\":\"Journal of Linguistics/Jazykovedný casopis\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Linguistics/Jazykovedný casopis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jazcas-2023-0013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Linguistics/Jazykovedný casopis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jazcas-2023-0013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

如今,词嵌入已经成为自然语言处理的一种标准方法,这在很大程度上是由于大量语言语料库的可用性。该模型有效地反映了单词之间的语义关系,而无需任何额外的语言输入。最近,更多的重点放在解释一些查询的看似歧视性的结果上,目标是消除语言模型的偏见。然而,如果我们认为向量空间是语言语义空间的合理有效模型,那么词嵌入中的不对称和随后的歧视是否反映了语言固有的(平均)歧视倾向?本文探讨了Visegrád组语言的词嵌入模型,并应用基本向量算法来展示模型中存在的基本语言不对称。众所周知,在英语模型中,当交换性别时,向量转换会导致非常准确的预测(著名的国王-男人+女人=女王),但这些转换也会导致某些职业的角色不互补(医生-男人+女人=护士,或者计算机程序员-男人+女人=家庭主妇)。本文探讨了V4语言(斯洛伐克语、捷克语、波兰语和匈牙利语)模型中的类似转移。匈牙利语性别中性,波兰语男性化,斯洛伐克语和捷克语相似,我们希望根据真实的语言数据,发现这些语言中性别不对称的有趣异同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gender Asymmetry of Visegrád Group Languages as Reflected by Word Embeddings
Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信