{"title":"汉字成分的无监督缓解性别偏见——以中文词嵌入为例","authors":"Xiuying Chen, Mingzhe Li, Rui Yan, Xin Gao, Xiangliang Zhang","doi":"10.18653/v1/2022.gebnlp-1.14","DOIUrl":null,"url":null,"abstract":"Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases.However, debias on the Chinese language, one of the most spoken languages, has been less explored.Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming.In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data.Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure.This consequently alleviates discriminative gender biases.Experimental results on public benchmark datasets show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.","PeriodicalId":161909,"journal":{"name":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Unsupervised Mitigating Gender Bias by Character Components: A Case Study of Chinese Word Embedding\",\"authors\":\"Xiuying Chen, Mingzhe Li, Rui Yan, Xin Gao, Xiangliang Zhang\",\"doi\":\"10.18653/v1/2022.gebnlp-1.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases.However, debias on the Chinese language, one of the most spoken languages, has been less explored.Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming.In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data.Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure.This consequently alleviates discriminative gender biases.Experimental results on public benchmark datasets show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.\",\"PeriodicalId\":161909,\"journal\":{\"name\":\"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)\",\"volume\":\"168 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.gebnlp-1.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.gebnlp-1.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised Mitigating Gender Bias by Character Components: A Case Study of Chinese Word Embedding
Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases.However, debias on the Chinese language, one of the most spoken languages, has been less explored.Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming.In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data.Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure.This consequently alleviates discriminative gender biases.Experimental results on public benchmark datasets show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.