Christine Kaeser-Chen, Elizabeth Dubois, Friederike Schuur, E. Moss
{"title":"Positionality-aware machine learning: translation tutorial","authors":"Christine Kaeser-Chen, Elizabeth Dubois, Friederike Schuur, E. Moss","doi":"10.1145/3351095.3375666","DOIUrl":null,"url":null,"abstract":"Positionality is a person's unique and always partial view of the world which is shaped by social and political contexts. Machine Learning (ML) systems have positionality, too, as a consequence of the choices we make when we develop ML systems. Being positionality-aware is key for ML practitioners to acknowledge and embrace the necessary choices embedded in ML by its creators. When groups form a shared view of the world, or group positionality, they have the power to embed and institutionalize their unique perspectives in artifacts such as standards and ontologies. For example, the international standard for reporting diseases and health conditions (International Classification of Diseases, ICD) is shaped by a distinctly medical, European and North American perspective. It dictates how we collect data, and limits what questions we can ask of data and what ML systems we can develop. Researchers struggle to study the effects of social factors on health outcomes because of what the ICD renders legible (usually in medicalized terms) and what it renders invisible (usually social contexts) in data. The ICD, as with all information infrastructures, promotes and propagates the perspective(s) of its creators. Over time, it establishes what counts as \"truth\". Positionality, and how it embeds itself in standards, ontologies, and data collection, is the root for bias in our data and algorithms. Every perspective has its limits - there is no view from nowhere. Without an awareness of positionality, the current debate on bias in machine learning is quite limited: adding more data to the set cannot remove bias. Instead, we propose positionality-aware ML, a new workflow focused on continuous evaluation and improvement of the fit between the positionality embedded in ML systems and the scenarios within which it is deployed. To demonstrate how to uncover positionality in standards, ontologies, data, and ML systems, we discuss recent work on online harassment of Canadian journalists and politicians on Twitter. Using legal definitions of hate speech and harassment, Twitter's community standards, and insight from interviews with journalists and politicians, we created standards and annotation guidelines for labeling the intensity of harassment in tweets. We then hand labeled a sample of data and through this process identified instances where positionality impacts choices about how many categories of harassment should exist, how to label boundary cases, and how to interpret messy data. We take three perspectives---technical, systems, socio-technical---that when combined illuminate areas of tension which serve as a signal of misalignment between the positionality embedded in the ML system and the deployment context. We demonstrate how the concept of positionality allows us to delineate sets of use cases that may not be suited for automated, ML solutions. Finally, we discuss strategies for developing positionality-aware ML systems, which embed a positionality appropriate for the application context, and continuously evolve to maintain this contextual fit, with an emphasis on the need for of democratic, egalitarian dialogues between knowledge-producing groups.","PeriodicalId":377829,"journal":{"name":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3351095.3375666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Positionality is a person's unique and always partial view of the world which is shaped by social and political contexts. Machine Learning (ML) systems have positionality, too, as a consequence of the choices we make when we develop ML systems. Being positionality-aware is key for ML practitioners to acknowledge and embrace the necessary choices embedded in ML by its creators. When groups form a shared view of the world, or group positionality, they have the power to embed and institutionalize their unique perspectives in artifacts such as standards and ontologies. For example, the international standard for reporting diseases and health conditions (International Classification of Diseases, ICD) is shaped by a distinctly medical, European and North American perspective. It dictates how we collect data, and limits what questions we can ask of data and what ML systems we can develop. Researchers struggle to study the effects of social factors on health outcomes because of what the ICD renders legible (usually in medicalized terms) and what it renders invisible (usually social contexts) in data. The ICD, as with all information infrastructures, promotes and propagates the perspective(s) of its creators. Over time, it establishes what counts as "truth". Positionality, and how it embeds itself in standards, ontologies, and data collection, is the root for bias in our data and algorithms. Every perspective has its limits - there is no view from nowhere. Without an awareness of positionality, the current debate on bias in machine learning is quite limited: adding more data to the set cannot remove bias. Instead, we propose positionality-aware ML, a new workflow focused on continuous evaluation and improvement of the fit between the positionality embedded in ML systems and the scenarios within which it is deployed. To demonstrate how to uncover positionality in standards, ontologies, data, and ML systems, we discuss recent work on online harassment of Canadian journalists and politicians on Twitter. Using legal definitions of hate speech and harassment, Twitter's community standards, and insight from interviews with journalists and politicians, we created standards and annotation guidelines for labeling the intensity of harassment in tweets. We then hand labeled a sample of data and through this process identified instances where positionality impacts choices about how many categories of harassment should exist, how to label boundary cases, and how to interpret messy data. We take three perspectives---technical, systems, socio-technical---that when combined illuminate areas of tension which serve as a signal of misalignment between the positionality embedded in the ML system and the deployment context. We demonstrate how the concept of positionality allows us to delineate sets of use cases that may not be suited for automated, ML solutions. Finally, we discuss strategies for developing positionality-aware ML systems, which embed a positionality appropriate for the application context, and continuously evolve to maintain this contextual fit, with an emphasis on the need for of democratic, egalitarian dialogues between knowledge-producing groups.