有偏见的优先级，有偏见的结果:面向伦理的数据注释实践的三个建议

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society Pub Date : 2020-02-04 DOI:10.1145/3375627.3375809

Gunay Kazimzade, Milagros Miceli

{"title":"有偏见的优先级，有偏见的结果:面向伦理的数据注释实践的三个建议","authors":"Gunay Kazimzade, Milagros Miceli","doi":"10.1145/3375627.3375809","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze the relation between data-related biases and practices of data annotation, by placing them in the context of market economy. We understand annotation as a praxis related to the sensemaking of data and investigate annotation practices for vision models by focusing on the values that are prioritized by industrial decision-makers and practitioners. The quality of data is critical for machine learning models as it holds the power to (mis-)represent the population it is intended to analyze. For autonomous systems to be able to make sense of the world, humans first need to make sense of the data these systems will be trained on. This paper addresses this issue, guided by the following research questions: Which goals are prioritized by decision-makers at the data annotation stage? How do these priorities correlate with data-related bias issues? Focusing on work practices and their context, our research goal aims at understanding the logics driving companies and their impact on the performed annotations. The study follows a qualitative design and is based on 24 interviews with relevant actors and extensive participatory observations, including several weeks of fieldwork at two companies dedicated to data annotation for vision models in Buenos Aires, Argentina and Sofia, Bulgaria. The prevalence of market-oriented values over socially responsible approaches is argued based on three corporate priorities that inform work practices in this field and directly shape the annotations performed: profit (short deadlines connected to the strive for profit are prioritized over alternative approaches that could prevent biased outcomes), standardization (the strive for standardized and, in many cases, reductive or biased annotations to make data fit the products and revenue plans of clients), and opacity (related to client's power to impose their criteria on the annotations that are performed. Criteria that most of the times remain opaque due to corporate confidentiality). Finally, we introduce three elements, aiming at developing ethics-oriented practices of data annotation, that could help prevent biased outcomes: transparency (regarding the documentation of data transformations, including information on responsibilities and criteria for decision-making.), education (training on the potential harms caused by AI and its ethical implications, that could help data annotators and related roles adopt a more critical approach towards the interpretation and labeling of data), and regulations (clear guidelines for ethical AI developed at the governmental level and applied both in private and public organizations).","PeriodicalId":93612,"journal":{"name":"Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society","volume":"449 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data Annotation Practices\",\"authors\":\"Gunay Kazimzade, Milagros Miceli\",\"doi\":\"10.1145/3375627.3375809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyze the relation between data-related biases and practices of data annotation, by placing them in the context of market economy. We understand annotation as a praxis related to the sensemaking of data and investigate annotation practices for vision models by focusing on the values that are prioritized by industrial decision-makers and practitioners. The quality of data is critical for machine learning models as it holds the power to (mis-)represent the population it is intended to analyze. For autonomous systems to be able to make sense of the world, humans first need to make sense of the data these systems will be trained on. This paper addresses this issue, guided by the following research questions: Which goals are prioritized by decision-makers at the data annotation stage? How do these priorities correlate with data-related bias issues? Focusing on work practices and their context, our research goal aims at understanding the logics driving companies and their impact on the performed annotations. The study follows a qualitative design and is based on 24 interviews with relevant actors and extensive participatory observations, including several weeks of fieldwork at two companies dedicated to data annotation for vision models in Buenos Aires, Argentina and Sofia, Bulgaria. The prevalence of market-oriented values over socially responsible approaches is argued based on three corporate priorities that inform work practices in this field and directly shape the annotations performed: profit (short deadlines connected to the strive for profit are prioritized over alternative approaches that could prevent biased outcomes), standardization (the strive for standardized and, in many cases, reductive or biased annotations to make data fit the products and revenue plans of clients), and opacity (related to client's power to impose their criteria on the annotations that are performed. Criteria that most of the times remain opaque due to corporate confidentiality). Finally, we introduce three elements, aiming at developing ethics-oriented practices of data annotation, that could help prevent biased outcomes: transparency (regarding the documentation of data transformations, including information on responsibilities and criteria for decision-making.), education (training on the potential harms caused by AI and its ethical implications, that could help data annotators and related roles adopt a more critical approach towards the interpretation and labeling of data), and regulations (clear guidelines for ethical AI developed at the governmental level and applied both in private and public organizations).\",\"PeriodicalId\":93612,\"journal\":{\"name\":\"Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"449 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3375627.3375809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3375627.3375809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

本文从市场经济的角度出发，分析了数据相关偏差与数据标注实践的关系。我们将注释理解为一种与数据语义相关的实践，并通过关注工业决策者和实践者优先考虑的价值来研究视觉模型的注释实践。数据的质量对于机器学习模型至关重要，因为它有能力(错误地)代表它想要分析的人群。为了让自主系统能够理解世界，人类首先需要理解这些系统将接受训练的数据。本文以以下研究问题为指导，解决了这一问题:决策者在数据注释阶段优先考虑哪些目标?这些优先级如何与数据相关的偏见问题相关联?专注于工作实践及其上下文，我们的研究目标旨在理解驱动公司的逻辑及其对执行注释的影响。该研究遵循定性设计，基于对相关参与者的24次访谈和广泛的参与性观察，包括在阿根廷布宜诺斯艾利斯和保加利亚索非亚两家致力于视觉模型数据注释的公司进行的为期数周的实地考察。以市场为导向的价值观在社会责任方法上的盛行是基于三个公司优先事项来争论的，这些优先事项为该领域的工作实践提供了信息，并直接影响了所执行的注释:利润(与追求利润相关的短期限优先于可以防止偏差结果的替代方法)，标准化(争取标准化，在许多情况下，简化或有偏差的注释，以使数据符合客户的产品和收入计划)，以及不透明性(与客户将其标准强加于所执行的注释的权力有关。由于公司保密，这些标准在大多数情况下仍然不透明)。最后，我们介绍了三个元素，旨在发展以伦理为导向的数据注释实践，这有助于防止有偏见的结果:透明度(关于数据转换的文档，包括关于责任和决策标准的信息)、教育(关于人工智能造成的潜在危害及其伦理影响的培训，这可以帮助数据注释者和相关角色对数据的解释和标记采取更关键的方法)、和法规(在政府层面制定的道德人工智能的明确指导方针，并适用于私人和公共组织)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data Annotation Practices

In this paper, we analyze the relation between data-related biases and practices of data annotation, by placing them in the context of market economy. We understand annotation as a praxis related to the sensemaking of data and investigate annotation practices for vision models by focusing on the values that are prioritized by industrial decision-makers and practitioners. The quality of data is critical for machine learning models as it holds the power to (mis-)represent the population it is intended to analyze. For autonomous systems to be able to make sense of the world, humans first need to make sense of the data these systems will be trained on. This paper addresses this issue, guided by the following research questions: Which goals are prioritized by decision-makers at the data annotation stage? How do these priorities correlate with data-related bias issues? Focusing on work practices and their context, our research goal aims at understanding the logics driving companies and their impact on the performed annotations. The study follows a qualitative design and is based on 24 interviews with relevant actors and extensive participatory observations, including several weeks of fieldwork at two companies dedicated to data annotation for vision models in Buenos Aires, Argentina and Sofia, Bulgaria. The prevalence of market-oriented values over socially responsible approaches is argued based on three corporate priorities that inform work practices in this field and directly shape the annotations performed: profit (short deadlines connected to the strive for profit are prioritized over alternative approaches that could prevent biased outcomes), standardization (the strive for standardized and, in many cases, reductive or biased annotations to make data fit the products and revenue plans of clients), and opacity (related to client's power to impose their criteria on the annotations that are performed. Criteria that most of the times remain opaque due to corporate confidentiality). Finally, we introduce three elements, aiming at developing ethics-oriented practices of data annotation, that could help prevent biased outcomes: transparency (regarding the documentation of data transformations, including information on responsibilities and criteria for decision-making.), education (training on the potential harms caused by AI and its ethical implications, that could help data annotators and related roles adopt a more critical approach towards the interpretation and labeling of data), and regulations (clear guidelines for ethical AI developed at the governmental level and applied both in private and public organizations).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

自引率

0.00%

发文量