{"title":"预训练菲律宾语快速文本嵌入中的性别偏见评估","authors":"L. C. Gamboa, Maria Regina Justina Estuar","doi":"10.1109/ITIKD56332.2023.10100022","DOIUrl":null,"url":null,"abstract":"Past studies show that word embeddings can learn gender biases introduced by human agents into the textual corpora used to train these models. However, it has also been shown that some non-English embeddings may actually not capture such biases in their word representations. This study, therefore, aimed to answer the question: Does the publicly available Filipino FastText word embedding contain gender bias? Various iterations of the Word Embedding Association Test and principal component analysis were conducted on the embedding to answer this question. Results show that the Tagalog FastText embedding not only represents gendered semantic information properly but also captures biases about masculinity and femininity collectively held by Filipinos. Specifically, the embedding most strongly associates the female with nouns pertaining to domestic and caregiving roles and the male with verbs relating to strength and their bodies. The study's findings can help determine what next steps need to be undertaken to reduce or eliminate bias from Filipino embeddings.","PeriodicalId":283631,"journal":{"name":"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Evaluating Gender Bias in Pre-trained Filipino FastText Embeddings\",\"authors\":\"L. C. Gamboa, Maria Regina Justina Estuar\",\"doi\":\"10.1109/ITIKD56332.2023.10100022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Past studies show that word embeddings can learn gender biases introduced by human agents into the textual corpora used to train these models. However, it has also been shown that some non-English embeddings may actually not capture such biases in their word representations. This study, therefore, aimed to answer the question: Does the publicly available Filipino FastText word embedding contain gender bias? Various iterations of the Word Embedding Association Test and principal component analysis were conducted on the embedding to answer this question. Results show that the Tagalog FastText embedding not only represents gendered semantic information properly but also captures biases about masculinity and femininity collectively held by Filipinos. Specifically, the embedding most strongly associates the female with nouns pertaining to domestic and caregiving roles and the male with verbs relating to strength and their bodies. The study's findings can help determine what next steps need to be undertaken to reduce or eliminate bias from Filipino embeddings.\",\"PeriodicalId\":283631,\"journal\":{\"name\":\"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITIKD56332.2023.10100022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITIKD56332.2023.10100022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Gender Bias in Pre-trained Filipino FastText Embeddings
Past studies show that word embeddings can learn gender biases introduced by human agents into the textual corpora used to train these models. However, it has also been shown that some non-English embeddings may actually not capture such biases in their word representations. This study, therefore, aimed to answer the question: Does the publicly available Filipino FastText word embedding contain gender bias? Various iterations of the Word Embedding Association Test and principal component analysis were conducted on the embedding to answer this question. Results show that the Tagalog FastText embedding not only represents gendered semantic information properly but also captures biases about masculinity and femininity collectively held by Filipinos. Specifically, the embedding most strongly associates the female with nouns pertaining to domestic and caregiving roles and the male with verbs relating to strength and their bodies. The study's findings can help determine what next steps need to be undertaken to reduce or eliminate bias from Filipino embeddings.