跨人类语言的自然语言处理中的性别偏见和代表性不足

Abigail V. Matthews
{"title":"跨人类语言的自然语言处理中的性别偏见和代表性不足","authors":"Abigail V. Matthews","doi":"10.1145/3461702.3462530","DOIUrl":null,"url":null,"abstract":"Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Gender Bias and Under-Representation in Natural Language Processing Across Human Languages\",\"authors\":\"Abigail V. Matthews\",\"doi\":\"10.1145/3461702.3462530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.\",\"PeriodicalId\":197336,\"journal\":{\"name\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461702.3462530\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462530","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

自然语言处理(NLP)系统是许多关键的自动化决策系统的核心,对我们的未来世界提出重要建议。然而,这些系统反映了广泛的偏见,从性别偏见到它们所代表的声音的偏见。在本文中,一个由9种语言(汉语、西班牙语、英语、阿拉伯语、德语、法语、波斯语、乌尔都语和沃洛夫语)使用者组成的团队报告并分析了维基百科语料库中这9种语言的性别偏见测量结果。在这个过程中,我们还记录了我们的工作如何揭示了许多语言的nlp管道中的关键差距。尽管在多语言支持方面投入了大量资金,但现代自然语言处理渠道在自然语言处理指导下的决策中,仍然系统地、显著地低估了大多数人的声音,而这些声音正在塑造我们共同的未来。我们开发了最初为英语设计的专业级和语料库级性别偏见度量计算的扩展,并将其应用于其他8种语言,包括西班牙语、阿拉伯语、德语、法语和乌尔都语等具有语法性别化名词的语言,包括不同的女性、男性和中性专业词汇。我们比较了不同语言的维基百科语料库以及一些更传统的文学语料库的性别偏见测量结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gender Bias and Under-Representation in Natural Language Processing Across Human Languages
Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信