{"title":"Can an Online Service Predict Gender? On the State-of-the-Art in Gender Identification from Texts","authors":"Stefan Krüger, Ben Hermann","doi":"10.1109/GE.2019.00012","DOIUrl":null,"url":null,"abstract":"Gender equality initiatives are often faced with a problem: In order to determine whether initiatives are successful the gender of individuals in the target group must be known. As self-identification inherently has the problems that individuals have to respond and results may, therefore, be biased and incomplete, the temptation to use automated gender identification methods is evident. In the scientific literature, multiple sources ranging from the individual’s name, their social media choices, biological features (e.g., brain scans or fingerprints), to texts attributed to the individual are used for automated gender identification with varying success. In this paper, we systematically inspect scientific publications for gender prediction based on textual data which are published between January 2017 and January 2019 in order to determine if such approaches may supply viable means to reliably determine an author’s gender. However, we find that the best approach in the current state-of-the-art works with an accuracy of only 93.4%. Moreover, we discuss the possible harm that gender identification systems might entail due to their inaccuracy and also given that they are assuming a binary gender model. We conclude that gender identification based on textual data is currently no reliable substitute for self-identification.","PeriodicalId":221039,"journal":{"name":"2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GE.2019.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Gender equality initiatives are often faced with a problem: In order to determine whether initiatives are successful the gender of individuals in the target group must be known. As self-identification inherently has the problems that individuals have to respond and results may, therefore, be biased and incomplete, the temptation to use automated gender identification methods is evident. In the scientific literature, multiple sources ranging from the individual’s name, their social media choices, biological features (e.g., brain scans or fingerprints), to texts attributed to the individual are used for automated gender identification with varying success. In this paper, we systematically inspect scientific publications for gender prediction based on textual data which are published between January 2017 and January 2019 in order to determine if such approaches may supply viable means to reliably determine an author’s gender. However, we find that the best approach in the current state-of-the-art works with an accuracy of only 93.4%. Moreover, we discuss the possible harm that gender identification systems might entail due to their inaccuracy and also given that they are assuming a binary gender model. We conclude that gender identification based on textual data is currently no reliable substitute for self-identification.