{"title":"从人名的情感关联中去除词嵌入的偏见","authors":"C. Hube, Maximilian Idahl, B. Fetahu","doi":"10.1145/3336191.3371779","DOIUrl":null,"url":null,"abstract":"Word embeddings, trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. \"Smith\"), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Debiasing Word Embeddings from Sentiment Associations in Names\",\"authors\":\"C. Hube, Maximilian Idahl, B. Fetahu\",\"doi\":\"10.1145/3336191.3371779\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word embeddings, trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. \\\"Smith\\\"), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.\",\"PeriodicalId\":319008,\"journal\":{\"name\":\"Proceedings of the 13th International Conference on Web Search and Data Mining\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3336191.3371779\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3336191.3371779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Debiasing Word Embeddings from Sentiment Associations in Names
Word embeddings, trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. "Smith"), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.