{"title":"从词嵌入看Visegrád群体语言的性别不对称","authors":"R. Garabík, Jana Wachtarczyková","doi":"10.2478/jazcas-2023-0013","DOIUrl":null,"url":null,"abstract":"Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gender Asymmetry of Visegrád Group Languages as Reflected by Word Embeddings\",\"authors\":\"R. Garabík, Jana Wachtarczyková\",\"doi\":\"10.2478/jazcas-2023-0013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.\",\"PeriodicalId\":262732,\"journal\":{\"name\":\"Journal of Linguistics/Jazykovedný casopis\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Linguistics/Jazykovedný casopis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jazcas-2023-0013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Linguistics/Jazykovedný casopis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jazcas-2023-0013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Gender Asymmetry of Visegrád Group Languages as Reflected by Word Embeddings
Abstract Today, word embeddings have become a standard method in natural language processing, largely due to the availability of large language corpora. The models effectively reflect the semantic relationships between words without any additional linguistic input. Recently, more emphasis has been placed on interpreting the seemingly discriminatory results of some queries, with the goal of de-biasing language models. However, if we consider the vector space to be a reasonably valid model of a linguistic semantic space, does not the asymmetry and subsequent discrimination in word embeddings reflect the (average) discriminatory tendencies inherent in the language? This article explores word embedding models for the Visegrád group languages and we apply basic vector arithmetic to demonstrate the basic language asymmetry present in the models. It is well known that in English models, vector transfers result in eerily accurate predictions when swapping genders (the famous king – man + woman = queen), but these transfers also result in rather uncomplimentary roles for certain occupations (doctor – man + woman = nurse, or computer programmer – man + woman = homemaker). The article explores similar transfers in models of V4 languages – Slovak, Czech, Polish, and Hungarian. With Hungarian gender neutrality, Polish strong generic masculine, and close parallels between Slovak and Czech, we hope to uncover interesting similarities and differences in gender asymmetry in these languages, based on real language data.