Alexandre Puttick, Catherine Ikae, Carlotta Rigotti, Eduard Fosch-Villaronga, Mark W. Kharas, Roger A. Søraa, Mascha Kurpicz-Briki
{"title":"非英语词嵌入和语言模型的偏见检测方法的系统综述","authors":"Alexandre Puttick, Catherine Ikae, Carlotta Rigotti, Eduard Fosch-Villaronga, Mark W. Kharas, Roger A. Søraa, Mascha Kurpicz-Briki","doi":"10.1007/s10462-025-11375-8","DOIUrl":null,"url":null,"abstract":"<div><p>Biases in applications of machine learning and artificial intelligence are a major limitation of these applications. Stereotypes of the society are reflected in different types of applications, including image generation, machine translation or CV ranking. This is in particular also the case for language models and word embeddings, encoding human language as mathematical vectors. Research addressing the challenging problem of detection (and mitigation) of the bias in these embeddings is often conducted for the English language. However, the stereotypes encoded can be language dependent and impacted by a cultural environment. Thus, dedicated research efforts for languages other than English are required. In this paper, we conduct a systematic literature review to identify and compare existing bias detection methods for non-English word embeddings and language models. In an interdisciplinary team we examine the technical aspects, as well as the definitions of bias used by researchers in the field. Based on our findings, we outline a research plan for making bias detection in the field of NLP more inclusive for languages other than English.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 12","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11375-8.pdf","citationCount":"0","resultStr":"{\"title\":\"A systematic review of bias detection methods for non-English word embeddings and language models\",\"authors\":\"Alexandre Puttick, Catherine Ikae, Carlotta Rigotti, Eduard Fosch-Villaronga, Mark W. Kharas, Roger A. Søraa, Mascha Kurpicz-Briki\",\"doi\":\"10.1007/s10462-025-11375-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Biases in applications of machine learning and artificial intelligence are a major limitation of these applications. Stereotypes of the society are reflected in different types of applications, including image generation, machine translation or CV ranking. This is in particular also the case for language models and word embeddings, encoding human language as mathematical vectors. Research addressing the challenging problem of detection (and mitigation) of the bias in these embeddings is often conducted for the English language. However, the stereotypes encoded can be language dependent and impacted by a cultural environment. Thus, dedicated research efforts for languages other than English are required. In this paper, we conduct a systematic literature review to identify and compare existing bias detection methods for non-English word embeddings and language models. In an interdisciplinary team we examine the technical aspects, as well as the definitions of bias used by researchers in the field. Based on our findings, we outline a research plan for making bias detection in the field of NLP more inclusive for languages other than English.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"58 12\",\"pages\":\"\"},\"PeriodicalIF\":13.9000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-025-11375-8.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-025-11375-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11375-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A systematic review of bias detection methods for non-English word embeddings and language models
Biases in applications of machine learning and artificial intelligence are a major limitation of these applications. Stereotypes of the society are reflected in different types of applications, including image generation, machine translation or CV ranking. This is in particular also the case for language models and word embeddings, encoding human language as mathematical vectors. Research addressing the challenging problem of detection (and mitigation) of the bias in these embeddings is often conducted for the English language. However, the stereotypes encoded can be language dependent and impacted by a cultural environment. Thus, dedicated research efforts for languages other than English are required. In this paper, we conduct a systematic literature review to identify and compare existing bias detection methods for non-English word embeddings and language models. In an interdisciplinary team we examine the technical aspects, as well as the definitions of bias used by researchers in the field. Based on our findings, we outline a research plan for making bias detection in the field of NLP more inclusive for languages other than English.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.