Matej Ulčar, Anka Supej, M. Robnik-Sikonja, Senja Pollak
{"title":"斯洛文尼亚语和克罗地亚语在性别职业类比方面的词嵌入","authors":"Matej Ulčar, Anka Supej, M. Robnik-Sikonja, Senja Pollak","doi":"10.4312/SLO2.0.2021.1.26-59","DOIUrl":null,"url":null,"abstract":"In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Slovene and Croatian word embeddings in terms of gender occupational analogies\",\"authors\":\"Matej Ulčar, Anka Supej, M. Robnik-Sikonja, Senja Pollak\",\"doi\":\"10.4312/SLO2.0.2021.1.26-59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.\",\"PeriodicalId\":371035,\"journal\":{\"name\":\"Slovenščina 2.0: empirical, applied and interdisciplinary research\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Slovenščina 2.0: empirical, applied and interdisciplinary research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4312/SLO2.0.2021.1.26-59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Slovenščina 2.0: empirical, applied and interdisciplinary research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4312/SLO2.0.2021.1.26-59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Slovene and Croatian word embeddings in terms of gender occupational analogies
In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.