Machine learning and the politics of synthetic data

IF 6.5 1区社会学 Q1 SOCIAL SCIENCES, INTERDISCIPLINARY

Big Data & Society Pub Date : 2023-01-01 DOI:10.1177/20539517221145372

Benjamin N. Jacobsen

{"title":"Machine learning and the politics of synthetic data","authors":"Benjamin N. Jacobsen","doi":"10.1177/20539517221145372","DOIUrl":null,"url":null,"abstract":"Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are ‘synthetic’, not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.","PeriodicalId":47834,"journal":{"name":"Big Data & Society","volume":" ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data & Society","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/20539517221145372","RegionNum":1,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}

引用次数: 3

Abstract

Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are ‘synthetic’, not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.

查看原文本刊更多论文

机器学习和合成数据的政治

机器学习算法已经深深植根于当代社会。因此，人们对许多算法模型所训练的训练数据集的内容、偏见和基本假设给予了充分的关注。然而，当算法训练在非真实数据上，而是“合成”数据上，而不是指真实的人、对象或事件时，会发生什么？合成数据越来越多地被纳入机器学习算法的训练中，用于各种社会领域。然而，目前对机器学习算法的合成训练数据所起的作用及其伦理意义知之甚少。在这篇文章中，我通过两个核心方面探讨了合成数据的政治性：首先，合成数据有望成为算法可变性的丰富来源。其次，本文探讨了合成数据如何承诺将算法置于风险领域之外。我建议，对这两个领域的分析将有助于我们更好地理解根据合成数据设想机器学习算法的方式，以及合成训练数据如何积极地重新配置当代社会机器学习的可能性条件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data & Society SOCIAL SCIENCES, INTERDISCIPLINARY-

CiteScore

10.90

自引率

10.60%

发文量

审稿时长

11 weeks

期刊介绍： Big Data & Society (BD&S) is an open access, peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities, and computing and their intersections with the arts and natural sciences. The journal focuses on the implications of Big Data for societies and aims to connect debates about Big Data practices and their effects on various sectors such as academia, social life, industry, business, and government. BD&S considers Big Data as an emerging field of practices, not solely defined by but generative of unique data qualities such as high volume, granularity, data linking, and mining. The journal pays attention to digital content generated both online and offline, encompassing social media, search engines, closed networks (e.g., commercial or government transactions), and open networks like digital archives, open government, and crowdsourced data. Rather than providing a fixed definition of Big Data, BD&S encourages interdisciplinary inquiries, debates, and studies on various topics and themes related to Big Data practices. BD&S seeks contributions that analyze Big Data practices, involve empirical engagements and experiments with innovative methods, and reflect on the consequences of these practices for the representation, realization, and governance of societies. As a digital-only journal, BD&S's platform can accommodate multimedia formats such as complex images, dynamic visualizations, videos, and audio content. The contents of the journal encompass peer-reviewed research articles, colloquia, bookcasts, think pieces, state-of-the-art methods, and work by early career researchers.