Can an Online Service Predict Gender? On the State-of-the-Art in Gender Identification from Texts

2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE) Pub Date : 2019-05-01 DOI:10.1109/GE.2019.00012

Stefan Krüger, Ben Hermann

{"title":"Can an Online Service Predict Gender? On the State-of-the-Art in Gender Identification from Texts","authors":"Stefan Krüger, Ben Hermann","doi":"10.1109/GE.2019.00012","DOIUrl":null,"url":null,"abstract":"Gender equality initiatives are often faced with a problem: In order to determine whether initiatives are successful the gender of individuals in the target group must be known. As self-identification inherently has the problems that individuals have to respond and results may, therefore, be biased and incomplete, the temptation to use automated gender identification methods is evident. In the scientific literature, multiple sources ranging from the individual’s name, their social media choices, biological features (e.g., brain scans or fingerprints), to texts attributed to the individual are used for automated gender identification with varying success. In this paper, we systematically inspect scientific publications for gender prediction based on textual data which are published between January 2017 and January 2019 in order to determine if such approaches may supply viable means to reliably determine an author’s gender. However, we find that the best approach in the current state-of-the-art works with an accuracy of only 93.4%. Moreover, we discuss the possible harm that gender identification systems might entail due to their inaccuracy and also given that they are assuming a binary gender model. We conclude that gender identification based on textual data is currently no reliable substitute for self-identification.","PeriodicalId":221039,"journal":{"name":"2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GE.2019.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Gender equality initiatives are often faced with a problem: In order to determine whether initiatives are successful the gender of individuals in the target group must be known. As self-identification inherently has the problems that individuals have to respond and results may, therefore, be biased and incomplete, the temptation to use automated gender identification methods is evident. In the scientific literature, multiple sources ranging from the individual’s name, their social media choices, biological features (e.g., brain scans or fingerprints), to texts attributed to the individual are used for automated gender identification with varying success. In this paper, we systematically inspect scientific publications for gender prediction based on textual data which are published between January 2017 and January 2019 in order to determine if such approaches may supply viable means to reliably determine an author’s gender. However, we find that the best approach in the current state-of-the-art works with an accuracy of only 93.4%. Moreover, we discuss the possible harm that gender identification systems might entail due to their inaccuracy and also given that they are assuming a binary gender model. We conclude that gender identification based on textual data is currently no reliable substitute for self-identification.

查看原文本刊更多论文

在线服务能预测性别吗?论文本性别认同的研究进展

性别平等倡议经常面临一个问题:为了确定倡议是否成功，必须知道目标群体中个人的性别。由于自我认同本身就存在个人必须回应的问题，因此结果可能是有偏见和不完整的，因此使用自动性别识别方法的诱惑是显而易见的。在科学文献中，从个人的名字、他们的社交媒体选择、生物特征(例如，脑部扫描或指纹)到归因于个人的文本等多种来源被用于自动性别识别，取得了不同程度的成功。在本文中，我们系统地检查了2017年1月至2019年1月间发表的基于文本数据的科学出版物的性别预测，以确定这些方法是否可以提供可靠地确定作者性别的可行方法。然而，我们发现在目前最先进的技术中，最好的方法的准确率只有93.4%。此外，我们还讨论了由于性别识别系统的不准确性以及它们假设二元性别模型而可能带来的危害。我们得出结论，基于文本数据的性别认同目前还不能可靠地替代自我认同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE)

自引率

0.00%

发文量