{"title":"通过词嵌入检测阿拉伯语文本中的性别偏见。","authors":"Aya Mourad, Fatima K Abu Salem, Shady Elbassuoni","doi":"10.1371/journal.pone.0319301","DOIUrl":null,"url":null,"abstract":"<p><p>For generations, women have fought to achieve equal rights with those of men. Many historians and social scientists examined this uphill path with a focus on women's rights and economic status in the West. Other parts of the world, such as the Middle East, remain understudied, with a noticeable shortage in gender-based statistics in the economic arena. According to the sociocognitive theory of critical discourse analysis, social behaviors and norms are reflected by language discourses, which motivates the present study, where we examine gender-based biases in various occupations, as reflected through various textual corpora. Several works in literature have shown that word embedding models can learn biases from the textual data they are trained on, which can propagate societal prejudices that have been implicitly embedded in such text. In our study, we adapt WEAT and Direct Bias quantification tests for Arabic, to examine gender bias with respect to a wide set of occupations as reflected in various Arabic text datasets. These datasets include two Lebanese news archives, Arabic Wikipedia, and electronic newspapers in UAE, Egypt, and Morocco, thus providing different outlooks into female and male engagements in various professions. Our WEAT tests across all datasets indicate that words related to careers, science, and intellectual pursuits are linked to men. In contrast, words related to family and art are associated with women across all datasets. The Direct Bias analysis shows a consistent female gender bias towards professions such as nurse, house cleaner, maid, secretary, and dancer. As the Moroccan News Articles Dataset (MNAD) showed, females were also associated with additional occupations such as researcher, doctor, and professor. Considering that the Arab world remains short on census data exploring gender-based disparities across various professions, our work provides evidence that such stereotypes persist till this day.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 3","pages":"e0319301"},"PeriodicalIF":2.6000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957338/pdf/","citationCount":"0","resultStr":"{\"title\":\"Detecting gender bias in Arabic text through word embeddings.\",\"authors\":\"Aya Mourad, Fatima K Abu Salem, Shady Elbassuoni\",\"doi\":\"10.1371/journal.pone.0319301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>For generations, women have fought to achieve equal rights with those of men. Many historians and social scientists examined this uphill path with a focus on women's rights and economic status in the West. Other parts of the world, such as the Middle East, remain understudied, with a noticeable shortage in gender-based statistics in the economic arena. According to the sociocognitive theory of critical discourse analysis, social behaviors and norms are reflected by language discourses, which motivates the present study, where we examine gender-based biases in various occupations, as reflected through various textual corpora. Several works in literature have shown that word embedding models can learn biases from the textual data they are trained on, which can propagate societal prejudices that have been implicitly embedded in such text. In our study, we adapt WEAT and Direct Bias quantification tests for Arabic, to examine gender bias with respect to a wide set of occupations as reflected in various Arabic text datasets. These datasets include two Lebanese news archives, Arabic Wikipedia, and electronic newspapers in UAE, Egypt, and Morocco, thus providing different outlooks into female and male engagements in various professions. Our WEAT tests across all datasets indicate that words related to careers, science, and intellectual pursuits are linked to men. In contrast, words related to family and art are associated with women across all datasets. The Direct Bias analysis shows a consistent female gender bias towards professions such as nurse, house cleaner, maid, secretary, and dancer. As the Moroccan News Articles Dataset (MNAD) showed, females were also associated with additional occupations such as researcher, doctor, and professor. Considering that the Arab world remains short on census data exploring gender-based disparities across various professions, our work provides evidence that such stereotypes persist till this day.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 3\",\"pages\":\"e0319301\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11957338/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0319301\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0319301","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Detecting gender bias in Arabic text through word embeddings.
For generations, women have fought to achieve equal rights with those of men. Many historians and social scientists examined this uphill path with a focus on women's rights and economic status in the West. Other parts of the world, such as the Middle East, remain understudied, with a noticeable shortage in gender-based statistics in the economic arena. According to the sociocognitive theory of critical discourse analysis, social behaviors and norms are reflected by language discourses, which motivates the present study, where we examine gender-based biases in various occupations, as reflected through various textual corpora. Several works in literature have shown that word embedding models can learn biases from the textual data they are trained on, which can propagate societal prejudices that have been implicitly embedded in such text. In our study, we adapt WEAT and Direct Bias quantification tests for Arabic, to examine gender bias with respect to a wide set of occupations as reflected in various Arabic text datasets. These datasets include two Lebanese news archives, Arabic Wikipedia, and electronic newspapers in UAE, Egypt, and Morocco, thus providing different outlooks into female and male engagements in various professions. Our WEAT tests across all datasets indicate that words related to careers, science, and intellectual pursuits are linked to men. In contrast, words related to family and art are associated with women across all datasets. The Direct Bias analysis shows a consistent female gender bias towards professions such as nurse, house cleaner, maid, secretary, and dancer. As the Moroccan News Articles Dataset (MNAD) showed, females were also associated with additional occupations such as researcher, doctor, and professor. Considering that the Arab world remains short on census data exploring gender-based disparities across various professions, our work provides evidence that such stereotypes persist till this day.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage