从词嵌入看Visegrád群体语言的性别不对称

Journal of Linguistics/Jazykovedný casopis Pub Date : 2022-12-01 DOI:10.2478/jazcas-2023-0013

R. Garabík, Jana Wachtarczyková

{"title":"从词嵌入看Visegrád群体语言的性别不对称","authors":"R. Garabík, Jana Wachtarczyková","doi":"10.2478/jazcas-2023-0013","DOIUrl":null,"url":null,"abstract":"Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gender Asymmetry of Visegrád Group Languages as Reflected by Word Embeddings\",\"authors\":\"R. Garabík, Jana Wachtarczyková\",\"doi\":\"10.2478/jazcas-2023-0013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.\",\"PeriodicalId\":262732,\"journal\":{\"name\":\"Journal of Linguistics/Jazykovedný casopis\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Linguistics/Jazykovedný casopis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jazcas-2023-0013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Linguistics/Jazykovedný casopis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jazcas-2023-0013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

如今，词嵌入已经成为自然语言处理的一种标准方法，这在很大程度上是由于大量语言语料库的可用性。该模型有效地反映了单词之间的语义关系，而无需任何额外的语言输入。最近，更多的重点放在解释一些查询的看似歧视性的结果上，目标是消除语言模型的偏见。然而，如果我们认为向量空间是语言语义空间的合理有效模型，那么词嵌入中的不对称和随后的歧视是否反映了语言固有的(平均)歧视倾向?本文探讨了Visegrád组语言的词嵌入模型，并应用基本向量算法来展示模型中存在的基本语言不对称。众所周知，在英语模型中，当交换性别时，向量转换会导致非常准确的预测(著名的国王-男人+女人=女王)，但这些转换也会导致某些职业的角色不互补(医生-男人+女人=护士，或者计算机程序员-男人+女人=家庭主妇)。本文探讨了V4语言(斯洛伐克语、捷克语、波兰语和匈牙利语)模型中的类似转移。匈牙利语性别中性，波兰语男性化，斯洛伐克语和捷克语相似，我们希望根据真实的语言数据，发现这些语言中性别不对称的有趣异同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gender Asymmetry of Visegrád Group Languages as Reflected by Word Embeddings

Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Linguistics/Jazykovedný casopis

自引率

0.00%

发文量