{"title":"Contesting efficacy: Tensions between risk and inclusion in computer vision technology","authors":"Morgan Klaus Scheuerman","doi":"10.1002/fhu2.12","DOIUrl":null,"url":null,"abstract":"<p>Machine learning (ML) methods are now commonly used to make automated predictions about human beings—their lives and their characteristics. Vast amounts of individual data are aggregated to make predictions about people's shopping preferences, health status, or likelihood to recommit a crime. <i>Computer vision</i>, an ML task for training a computer to metaphorically ‘see’ specific objects, is a pertinent domain for examining the interaction between ML and human identity. <i>Facial analysis (FA)</i>, a subset of computer vision trained to complete tasks like facial classification and facial recognition, is trained to read visual data to make classifications about innate human identities. Identities like age (Lin et al., <span>2006</span>), gender (Khan et al., <span>2013</span>), ethnicity (Lu & Jain, <span>2004</span>) and even sexual orientation (Wang & Kosinski, <span>2017</span>). Often, decisions about identity characteristics are made without explicit user input—or even user knowledge. Users, effectively, become ‘targets’ of the system, having no ability to contest these classifications. Surrounding these identity classifications are concerns about bias (e.g., Buolamwini & Gebru, <span>2018</span>), representation (e.g., Hamidi et al., <span>2018</span>; Keyes, <span>2018</span>) and the embracing of pseudoscientific practices like physiognomy (e.g., Agüera y Arcas et al., <span>2017</span>).</p><p>In this short paper, I present several considerations for contestability for computer vision. By contestability, I refer to the agency that an individual has to contest the inputs and outputs of a computer vision system—including how one's data is collected, defined and used. I specifically focus on one identity trait for which to ground consideration: <i>gender</i>. Gender is a salient characteristic to consider given that criticisms of computer vision have stemmed from concerns of both sexism and cissexism, discrimination against transgender and nonbinary communities (Hibbs, <span>2014</span>). Gender in computer vision has largely been presented as binary (i.e., male vs. female) and has been exclusive of genders beyond the cisgender norm (e.g., in automatic gender recognition (AGR) systems that classify gender explicitly [Hamidi et al., <span>2018</span>; Keyes, <span>2018</span>; Scheuerman et al., <span>2019</span>]; in facial recognition systems that fail to properly recognise noncisgender male faces [Albiero et al., <span>2020</span>, <span>2022</span>; Urbi, <span>2018</span>]).</p><p>More specifically, I question whether the efficacy of AI technologies, like computer vision, are the correct pathway to ‘inclusivity’ for historically marginalised identities, like cisgender women and trans communities. By efficacy, I refer to the technical capability of a computer vision system to accurately classify or recognise diverse genders. Inclusivity thus refers to the inclusion of diverse genders in effective classification, rather than solely a cisgender male/female binary approach to gender. That is, I question whether a technology <i>working effectively</i> on marginalised populations is truly the form of inclusivity we should be striving towards, given increasing scrutiny that AI should even be used to accomplish tasks that may alter human lives.</p><p>Gender is a highly ubiquitous identity characteristic classified in computer vision. So much so, that there are computer vision models trained specifically for the task of classifying gender. <i>AGR</i> has been coined to describe gender classification methods in computer vision, like facial and body analysis (Hamidi et al., <span>2018</span>). ML researchers have contributed a great deal of effort into improving methods in pattern recognition for improving gender classification tasks—specifically, improving the accuracy of such tasks (e.g., Akbulut et al., <span>2017</span>). Proposed methods range from extracting facial morphology (Ramey & Salichs, <span>2014</span>) to modelling gait (Yu et al., <span>2009</span>) to extracting hair features (Lee & Wei, <span>2013</span>). Gender classification in computer vision has become so ubiquitous, it has been featured in almost every commercial FA service available for purchase (e.g., Amazon Rekognition – Video and Image—AWS, <span>2019</span>; Face API—Facial Recognition Software Microsoft Azure, <span>2019</span>; Watson Visual Recognition, <span>2019</span>).</p><p>As with most ML techniques that use human characteristics, concerns about fairness and bias have inundated AGR. Efforts to ensure that the predefined gender categories perform fairly on gender recognition targets has become a major of focus of this literature. For example, Buolamwini and Gebru (<span>2018</span>) notably found higher gender misclassification rates on women with darker skin than both men and women with lighter skin. Research on improving gender classification in computer vision has discussed that gender in AGR is performed solely on a binary—male or female, man or woman, masculine or feminine. This classification schema leaves out those who traverse the gender binary, or fall outside of it: trans and/or nonbinary1 people. In both academic AGR literature (Keyes, <span>2018</span>; Scheuerman et al., <span>2020</span>) and commercial settings (Scheuerman et al., <span>2019</span>), the possibility for gender to exist outside of the cisnormative2 has been largely erased. In fact, because AGR is largely trained on, presumably, cisgender, binary, normative faces, is it much more likely to perform poorly even on binary transgender individuals (Scheuerman et al., <span>2019</span>).</p><p>The only AGR work to date focused on improving AGR for trans individuals has been focused on recognising individual trans people across physical gender transition, using screenshots of educatory gender transition videos scraped from YouTube (Vincent, <span>2017b</span>). This work is arguably more focused on trans identity as a security concern than improving efficacy on behalf of transgender individuals. Thus, concerns about fairness in AGR have extended beyond bias auditing, raising questions about representation in technical systems and the harmful effects simplistic representations could have on individuals with marginalised genders—and larger community norms around gender. Even when companies claim to only encourage gender at the aggregate level, rather than the individual level (e.g., Amazon Rekognition), the continued use of stereotypical and reductive gender categories fuels an increasingly hostile sociopolitical climate against noncisgender individuals (see 2024 Anti-Trans Bills, <span>2024</span>).</p><p>Scrutiny of AGR's efficacy on women and trans people, and its reliance on sex-gender conflations broadly (Scheuerman et al., <span>2021</span>), has led to some effective changes in commercial AGR systems recently. Many of the commercial systems which explicitly classify gender presented in critical scholarship (Buolamwini & Gebru, <span>2018</span>; Scheuerman et al., <span>2019</span>) have since removed AGR from their functionalities. For example, Microsoft's Azure no longer offers gender classification due to the potential risks (Bird, <span>2022</span>). Gender, at least for larger technology organisations, has shifted from an obvious classification feature to an ethically fraught one (Gustafson et al., <span>2023</span>). However, academic research on AGR continues ahead, largely unconcerned, still touting the importance of gender classification tasks to the field of computer vision (e.g., Patel & Patel, <span>2023</span>; Reddy et al., <span>2023</span>). Further, smaller companies across the globe continue to develop and deploy AGR (e.g., Clarifai, Face++, SenseTime).</p><p>Gender in computer vision is also not limited to AGR tasks. Gender also shows up in the qualitative labels in image tagging models (Barlas et al., <span>2021</span>; Katzman et al., <span>2023</span>; Scheuerman et al., <span>2019</span>, <span>2020</span>). Now, in the age of generative AI, gendered imagery is something that can be generated using text prompts (Bianchi et al., <span>2023</span>; Bird et al., <span>2023</span>; Sun et al., <span>2024</span>). And of course, gender is a salient identity factor in facial recognition technology (Albiero et al., <span>2020</span>, <span>2022</span>; Urbi, <span>2018</span>).</p><p>A major facet of concerns about the deployment of gender in computer vision is around <i>agency</i>: the agency to contest what classification decisions are made, the agency to define one's own gender in the classification schema, the agency to determine how gender characteristics will be used, and the agency to participate in training and evaluating AGR techniques in the first place. But giving users agency over how their identities are classified by a computer vision system presents several challenges—technically and ethically.</p><p>Researchers critical of gender classifications in computer vision technologies suggest, among other considerations, that agency over representation in a system can help alleviate some concerns about inadequate gender constructions (Hamidi et al., <span>2018</span>; Keyes, <span>2018</span>; Scheuerman et al., <span>2019</span>). Allowing individuals with diverse genders to define more nuanced and inclusive schemas for defining gender in computer vision systems could ideally alleviate concerns about stereotypes and cisnormative binaries. However, there are a number of barriers to implementing user input and contestable interfaces when dealing with ML-based systems, like computer vision. In particular, I will focus on the <i>technical</i> and the <i>ethical</i> obstacles to increasing user agency with the goal of more inclusive gender systems, highlighting what the tradeoffs might be when attempting to implement them. These challenges are not exhaustive, but provide a brief overview of some of the technical and social considerations (which may also often intersect or diverge) to building effective and inclusive gender approaches in computer vision.</p><p>What does it mean for computer vision learning to be <i>more inclusive</i>? At present, creating more inclusive models is largely proposed as creating <i>more effective</i> models. That is, computer vision models which work well on a more diverse population are perceived as a successful measure of inclusivity. Yet, many approaches to increasing the inclusiveness of computer vision tasks, such as AGR including trans populations, decreases its efficacy. Computer vision models with a ‘non-binary’ category will likely fail more often on binary categories.</p><p>Further, incorporating mechanisms for user agency and contestability, both technically and ethically, may lead to fewer training data, decreased diversity in that training data (dependent on who opts in), larger variation of gender labels, and users who simply want nothing to do with AGR. Increasing the number of genders which might be classified in AGR may create less accurate, and thus effective, classification systems. And many trans people may not be wished to be classified by such systems at all, meaning <i>effective</i> is not an appropriate measure of <i>inclusive</i>.</p><p>All in all, there are many technical barriers and ethical risks (and tradeoffs) to be considered when trying to implement more diverse identity classifications, begging the question of whether ‘more inclusive’ is the right path for FA technologies in the first place. Do we <i>want</i> our FA systems to be <i>effectively</i> inclusive? Do we want trans populations to be more accurately classified by AGR—and other computer vision—models? Or do we approach inclusivity from a different perspective—one which centres the social and political risks faced by trans communities, rather than the efficacy of AI?</p><p>If the answer is yes, to centre affected populations like trans communities first and foremost, we may have to sacrifice efficacy. We may have to accept systems which do not work well, but centre diversity and agency. We may have to accept systems which do not work well <i>on trans communities</i>, but otherwise protect them from being classified explicitly as, for example, ‘non-binary’. We may have to regulate how specific use cases, such as AGR, can be used, so those people misclassified by ineffective systems are not penalised. We may, in the end, have to stop developing AGR models altogether.</p><p>But most importantly, whether we choose to prioritise <i>effective</i> inclusivity or <i>ineffective</i> inclusivity, we have to establish stringent ethical policies that prevent the misuse of computer vision on nonconsenting and marginalised individuals. Interdisciplinary researchers focused on ethical AI have the opportunity to shift the axis of power towards the most marginalised in society; we have the capability of ensuring our systems are effective at progressing collective goals, which may actually mean ensuring they are <i>ineffective</i>.</p><p>The author declares no conflict of interest.</p>","PeriodicalId":100563,"journal":{"name":"Future Humanities","volume":"2 1-2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/fhu2.12","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Humanities","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/fhu2.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) methods are now commonly used to make automated predictions about human beings—their lives and their characteristics. Vast amounts of individual data are aggregated to make predictions about people's shopping preferences, health status, or likelihood to recommit a crime. Computer vision, an ML task for training a computer to metaphorically ‘see’ specific objects, is a pertinent domain for examining the interaction between ML and human identity. Facial analysis (FA), a subset of computer vision trained to complete tasks like facial classification and facial recognition, is trained to read visual data to make classifications about innate human identities. Identities like age (Lin et al., 2006), gender (Khan et al., 2013), ethnicity (Lu & Jain, 2004) and even sexual orientation (Wang & Kosinski, 2017). Often, decisions about identity characteristics are made without explicit user input—or even user knowledge. Users, effectively, become ‘targets’ of the system, having no ability to contest these classifications. Surrounding these identity classifications are concerns about bias (e.g., Buolamwini & Gebru, 2018), representation (e.g., Hamidi et al., 2018; Keyes, 2018) and the embracing of pseudoscientific practices like physiognomy (e.g., Agüera y Arcas et al., 2017).
In this short paper, I present several considerations for contestability for computer vision. By contestability, I refer to the agency that an individual has to contest the inputs and outputs of a computer vision system—including how one's data is collected, defined and used. I specifically focus on one identity trait for which to ground consideration: gender. Gender is a salient characteristic to consider given that criticisms of computer vision have stemmed from concerns of both sexism and cissexism, discrimination against transgender and nonbinary communities (Hibbs, 2014). Gender in computer vision has largely been presented as binary (i.e., male vs. female) and has been exclusive of genders beyond the cisgender norm (e.g., in automatic gender recognition (AGR) systems that classify gender explicitly [Hamidi et al., 2018; Keyes, 2018; Scheuerman et al., 2019]; in facial recognition systems that fail to properly recognise noncisgender male faces [Albiero et al., 2020, 2022; Urbi, 2018]).
More specifically, I question whether the efficacy of AI technologies, like computer vision, are the correct pathway to ‘inclusivity’ for historically marginalised identities, like cisgender women and trans communities. By efficacy, I refer to the technical capability of a computer vision system to accurately classify or recognise diverse genders. Inclusivity thus refers to the inclusion of diverse genders in effective classification, rather than solely a cisgender male/female binary approach to gender. That is, I question whether a technology working effectively on marginalised populations is truly the form of inclusivity we should be striving towards, given increasing scrutiny that AI should even be used to accomplish tasks that may alter human lives.
Gender is a highly ubiquitous identity characteristic classified in computer vision. So much so, that there are computer vision models trained specifically for the task of classifying gender. AGR has been coined to describe gender classification methods in computer vision, like facial and body analysis (Hamidi et al., 2018). ML researchers have contributed a great deal of effort into improving methods in pattern recognition for improving gender classification tasks—specifically, improving the accuracy of such tasks (e.g., Akbulut et al., 2017). Proposed methods range from extracting facial morphology (Ramey & Salichs, 2014) to modelling gait (Yu et al., 2009) to extracting hair features (Lee & Wei, 2013). Gender classification in computer vision has become so ubiquitous, it has been featured in almost every commercial FA service available for purchase (e.g., Amazon Rekognition – Video and Image—AWS, 2019; Face API—Facial Recognition Software Microsoft Azure, 2019; Watson Visual Recognition, 2019).
As with most ML techniques that use human characteristics, concerns about fairness and bias have inundated AGR. Efforts to ensure that the predefined gender categories perform fairly on gender recognition targets has become a major of focus of this literature. For example, Buolamwini and Gebru (2018) notably found higher gender misclassification rates on women with darker skin than both men and women with lighter skin. Research on improving gender classification in computer vision has discussed that gender in AGR is performed solely on a binary—male or female, man or woman, masculine or feminine. This classification schema leaves out those who traverse the gender binary, or fall outside of it: trans and/or nonbinary1 people. In both academic AGR literature (Keyes, 2018; Scheuerman et al., 2020) and commercial settings (Scheuerman et al., 2019), the possibility for gender to exist outside of the cisnormative2 has been largely erased. In fact, because AGR is largely trained on, presumably, cisgender, binary, normative faces, is it much more likely to perform poorly even on binary transgender individuals (Scheuerman et al., 2019).
The only AGR work to date focused on improving AGR for trans individuals has been focused on recognising individual trans people across physical gender transition, using screenshots of educatory gender transition videos scraped from YouTube (Vincent, 2017b). This work is arguably more focused on trans identity as a security concern than improving efficacy on behalf of transgender individuals. Thus, concerns about fairness in AGR have extended beyond bias auditing, raising questions about representation in technical systems and the harmful effects simplistic representations could have on individuals with marginalised genders—and larger community norms around gender. Even when companies claim to only encourage gender at the aggregate level, rather than the individual level (e.g., Amazon Rekognition), the continued use of stereotypical and reductive gender categories fuels an increasingly hostile sociopolitical climate against noncisgender individuals (see 2024 Anti-Trans Bills, 2024).
Scrutiny of AGR's efficacy on women and trans people, and its reliance on sex-gender conflations broadly (Scheuerman et al., 2021), has led to some effective changes in commercial AGR systems recently. Many of the commercial systems which explicitly classify gender presented in critical scholarship (Buolamwini & Gebru, 2018; Scheuerman et al., 2019) have since removed AGR from their functionalities. For example, Microsoft's Azure no longer offers gender classification due to the potential risks (Bird, 2022). Gender, at least for larger technology organisations, has shifted from an obvious classification feature to an ethically fraught one (Gustafson et al., 2023). However, academic research on AGR continues ahead, largely unconcerned, still touting the importance of gender classification tasks to the field of computer vision (e.g., Patel & Patel, 2023; Reddy et al., 2023). Further, smaller companies across the globe continue to develop and deploy AGR (e.g., Clarifai, Face++, SenseTime).
Gender in computer vision is also not limited to AGR tasks. Gender also shows up in the qualitative labels in image tagging models (Barlas et al., 2021; Katzman et al., 2023; Scheuerman et al., 2019, 2020). Now, in the age of generative AI, gendered imagery is something that can be generated using text prompts (Bianchi et al., 2023; Bird et al., 2023; Sun et al., 2024). And of course, gender is a salient identity factor in facial recognition technology (Albiero et al., 2020, 2022; Urbi, 2018).
A major facet of concerns about the deployment of gender in computer vision is around agency: the agency to contest what classification decisions are made, the agency to define one's own gender in the classification schema, the agency to determine how gender characteristics will be used, and the agency to participate in training and evaluating AGR techniques in the first place. But giving users agency over how their identities are classified by a computer vision system presents several challenges—technically and ethically.
Researchers critical of gender classifications in computer vision technologies suggest, among other considerations, that agency over representation in a system can help alleviate some concerns about inadequate gender constructions (Hamidi et al., 2018; Keyes, 2018; Scheuerman et al., 2019). Allowing individuals with diverse genders to define more nuanced and inclusive schemas for defining gender in computer vision systems could ideally alleviate concerns about stereotypes and cisnormative binaries. However, there are a number of barriers to implementing user input and contestable interfaces when dealing with ML-based systems, like computer vision. In particular, I will focus on the technical and the ethical obstacles to increasing user agency with the goal of more inclusive gender systems, highlighting what the tradeoffs might be when attempting to implement them. These challenges are not exhaustive, but provide a brief overview of some of the technical and social considerations (which may also often intersect or diverge) to building effective and inclusive gender approaches in computer vision.
What does it mean for computer vision learning to be more inclusive? At present, creating more inclusive models is largely proposed as creating more effective models. That is, computer vision models which work well on a more diverse population are perceived as a successful measure of inclusivity. Yet, many approaches to increasing the inclusiveness of computer vision tasks, such as AGR including trans populations, decreases its efficacy. Computer vision models with a ‘non-binary’ category will likely fail more often on binary categories.
Further, incorporating mechanisms for user agency and contestability, both technically and ethically, may lead to fewer training data, decreased diversity in that training data (dependent on who opts in), larger variation of gender labels, and users who simply want nothing to do with AGR. Increasing the number of genders which might be classified in AGR may create less accurate, and thus effective, classification systems. And many trans people may not be wished to be classified by such systems at all, meaning effective is not an appropriate measure of inclusive.
All in all, there are many technical barriers and ethical risks (and tradeoffs) to be considered when trying to implement more diverse identity classifications, begging the question of whether ‘more inclusive’ is the right path for FA technologies in the first place. Do we want our FA systems to be effectively inclusive? Do we want trans populations to be more accurately classified by AGR—and other computer vision—models? Or do we approach inclusivity from a different perspective—one which centres the social and political risks faced by trans communities, rather than the efficacy of AI?
If the answer is yes, to centre affected populations like trans communities first and foremost, we may have to sacrifice efficacy. We may have to accept systems which do not work well, but centre diversity and agency. We may have to accept systems which do not work well on trans communities, but otherwise protect them from being classified explicitly as, for example, ‘non-binary’. We may have to regulate how specific use cases, such as AGR, can be used, so those people misclassified by ineffective systems are not penalised. We may, in the end, have to stop developing AGR models altogether.
But most importantly, whether we choose to prioritise effective inclusivity or ineffective inclusivity, we have to establish stringent ethical policies that prevent the misuse of computer vision on nonconsenting and marginalised individuals. Interdisciplinary researchers focused on ethical AI have the opportunity to shift the axis of power towards the most marginalised in society; we have the capability of ensuring our systems are effective at progressing collective goals, which may actually mean ensuring they are ineffective.