Positionality-aware machine learning: translation tutorial

Christine Kaeser-Chen, Elizabeth Dubois, Friederike Schuur, E. Moss
{"title":"Positionality-aware machine learning: translation tutorial","authors":"Christine Kaeser-Chen, Elizabeth Dubois, Friederike Schuur, E. Moss","doi":"10.1145/3351095.3375666","DOIUrl":null,"url":null,"abstract":"Positionality is a person's unique and always partial view of the world which is shaped by social and political contexts. Machine Learning (ML) systems have positionality, too, as a consequence of the choices we make when we develop ML systems. Being positionality-aware is key for ML practitioners to acknowledge and embrace the necessary choices embedded in ML by its creators. When groups form a shared view of the world, or group positionality, they have the power to embed and institutionalize their unique perspectives in artifacts such as standards and ontologies. For example, the international standard for reporting diseases and health conditions (International Classification of Diseases, ICD) is shaped by a distinctly medical, European and North American perspective. It dictates how we collect data, and limits what questions we can ask of data and what ML systems we can develop. Researchers struggle to study the effects of social factors on health outcomes because of what the ICD renders legible (usually in medicalized terms) and what it renders invisible (usually social contexts) in data. The ICD, as with all information infrastructures, promotes and propagates the perspective(s) of its creators. Over time, it establishes what counts as \"truth\". Positionality, and how it embeds itself in standards, ontologies, and data collection, is the root for bias in our data and algorithms. Every perspective has its limits - there is no view from nowhere. Without an awareness of positionality, the current debate on bias in machine learning is quite limited: adding more data to the set cannot remove bias. Instead, we propose positionality-aware ML, a new workflow focused on continuous evaluation and improvement of the fit between the positionality embedded in ML systems and the scenarios within which it is deployed. To demonstrate how to uncover positionality in standards, ontologies, data, and ML systems, we discuss recent work on online harassment of Canadian journalists and politicians on Twitter. Using legal definitions of hate speech and harassment, Twitter's community standards, and insight from interviews with journalists and politicians, we created standards and annotation guidelines for labeling the intensity of harassment in tweets. We then hand labeled a sample of data and through this process identified instances where positionality impacts choices about how many categories of harassment should exist, how to label boundary cases, and how to interpret messy data. We take three perspectives---technical, systems, socio-technical---that when combined illuminate areas of tension which serve as a signal of misalignment between the positionality embedded in the ML system and the deployment context. We demonstrate how the concept of positionality allows us to delineate sets of use cases that may not be suited for automated, ML solutions. Finally, we discuss strategies for developing positionality-aware ML systems, which embed a positionality appropriate for the application context, and continuously evolve to maintain this contextual fit, with an emphasis on the need for of democratic, egalitarian dialogues between knowledge-producing groups.","PeriodicalId":377829,"journal":{"name":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3351095.3375666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Positionality is a person's unique and always partial view of the world which is shaped by social and political contexts. Machine Learning (ML) systems have positionality, too, as a consequence of the choices we make when we develop ML systems. Being positionality-aware is key for ML practitioners to acknowledge and embrace the necessary choices embedded in ML by its creators. When groups form a shared view of the world, or group positionality, they have the power to embed and institutionalize their unique perspectives in artifacts such as standards and ontologies. For example, the international standard for reporting diseases and health conditions (International Classification of Diseases, ICD) is shaped by a distinctly medical, European and North American perspective. It dictates how we collect data, and limits what questions we can ask of data and what ML systems we can develop. Researchers struggle to study the effects of social factors on health outcomes because of what the ICD renders legible (usually in medicalized terms) and what it renders invisible (usually social contexts) in data. The ICD, as with all information infrastructures, promotes and propagates the perspective(s) of its creators. Over time, it establishes what counts as "truth". Positionality, and how it embeds itself in standards, ontologies, and data collection, is the root for bias in our data and algorithms. Every perspective has its limits - there is no view from nowhere. Without an awareness of positionality, the current debate on bias in machine learning is quite limited: adding more data to the set cannot remove bias. Instead, we propose positionality-aware ML, a new workflow focused on continuous evaluation and improvement of the fit between the positionality embedded in ML systems and the scenarios within which it is deployed. To demonstrate how to uncover positionality in standards, ontologies, data, and ML systems, we discuss recent work on online harassment of Canadian journalists and politicians on Twitter. Using legal definitions of hate speech and harassment, Twitter's community standards, and insight from interviews with journalists and politicians, we created standards and annotation guidelines for labeling the intensity of harassment in tweets. We then hand labeled a sample of data and through this process identified instances where positionality impacts choices about how many categories of harassment should exist, how to label boundary cases, and how to interpret messy data. We take three perspectives---technical, systems, socio-technical---that when combined illuminate areas of tension which serve as a signal of misalignment between the positionality embedded in the ML system and the deployment context. We demonstrate how the concept of positionality allows us to delineate sets of use cases that may not be suited for automated, ML solutions. Finally, we discuss strategies for developing positionality-aware ML systems, which embed a positionality appropriate for the application context, and continuously evolve to maintain this contextual fit, with an emphasis on the need for of democratic, egalitarian dialogues between knowledge-producing groups.
位置感知机器学习:翻译教程
位置性是一个人对世界的独特的,总是片面的看法,这是由社会和政治背景塑造的。机器学习(ML)系统也具有位置性,这是我们在开发ML系统时所做选择的结果。位置感知是ML从业者承认并接受其创建者嵌入ML的必要选择的关键。当群体形成了对世界的共同看法或群体定位时,他们就有能力在诸如标准和本体之类的工件中嵌入和制度化他们独特的观点。例如,报告疾病和健康状况的国际标准(《国际疾病分类》)是根据明显的医学、欧洲和北美观点形成的。它决定了我们如何收集数据,并限制了我们可以对数据提出什么问题,以及我们可以开发什么样的机器学习系统。研究人员很难研究社会因素对健康结果的影响,因为《国际疾病分类》让数据变得清晰(通常是医学术语),而它让数据变得不可见(通常是社会背景)。与所有信息基础设施一样,ICD促进和传播其创建者的观点。随着时间的推移,它确立了所谓的“真相”。定位,以及它如何嵌入到标准、本体和数据收集中,是我们的数据和算法中偏见的根源。每一种观点都有它的局限性——不存在凭空而来的观点。如果没有对位置性的认识,目前关于机器学习中偏见的争论是相当有限的:向数据集中添加更多的数据并不能消除偏见。相反,我们提出了位置感知机器学习,这是一种新的工作流程,专注于持续评估和改进机器学习系统中嵌入的位置性与部署场景之间的契合度。为了展示如何揭示标准、本体、数据和机器学习系统中的定位,我们讨论了最近在Twitter上对加拿大记者和政治家的在线骚扰的研究。利用仇恨言论和骚扰的法律定义、Twitter的社区标准,以及对记者和政治家采访的见解,我们创建了标准和注释指南,用于标记推文中骚扰的强度。然后,我们手动标记数据样本,并通过这个过程确定位置影响选择的实例,包括应该存在多少种骚扰,如何标记边界案例,以及如何解释混乱的数据。我们采取了三个观点——技术、系统、社会技术——当它们结合在一起时,阐明了紧张的领域,这些领域作为ML系统中嵌入的位置与部署上下文之间不一致的信号。我们演示了位置性的概念如何允许我们描述可能不适合自动化ML解决方案的用例集。最后,我们讨论了开发位置感知ML系统的策略,该系统嵌入了适合应用上下文的位置性,并不断发展以保持这种上下文契合,强调知识生产群体之间需要民主、平等的对话。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信