野生 "协议：机器学习基准数据集构建中的标准与协调

Big Data Soc. Pub Date : 2024-04-03 DOI:10.1177/20539517241242457

Isak Engdahl

{"title":"野生 \"协议：机器学习基准数据集构建中的标准与协调","authors":"Isak Engdahl","doi":"10.1177/20539517241242457","DOIUrl":null,"url":null,"abstract":"This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"953 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction\",\"authors\":\"Isak Engdahl\",\"doi\":\"10.1177/20539517241242457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.\",\"PeriodicalId\":515929,\"journal\":{\"name\":\"Big Data Soc.\",\"volume\":\"953 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data Soc.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/20539517241242457\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Soc.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/20539517241242457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文通过人种学案例研究，介绍了一个企业-学术小组为各种机器学习和计算机视觉任务构建日常活动基准数据集的情况。文章采用社会技术视角，将数据集概念化为一种知识对象，它通过实用标准（日常活动、数据化、注释和基准）和协调工作（即为使这些标准在实践中有效而达成一致的努力）得到稳定。通过关注协调工作，文章强调了非正式的、沟通的和支持性的努力，这些努力是标准取得成功的基础，也是缓和参与者和因素之间紧张关系的基础。强调这些努力在多个方面做出了贡献。本文的人种学分析模式是对数据集量化指标的挑战和补充。它通过对作为集体成就的新基准数据集的开发进行详细的实证研究，推动了数据集分析领域的发展。通过展示对齐工作的重要性及其与标准的紧密联系和局限性，它加深了我们对机器学习数据集如何构建的理解。最重要的是，它对数据集的一个关键特征提出了质疑：数据集捕捉的是 "野外 "自然发生的无脚本活动，因为对齐工作会渗入数据捕捉的瞬间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction

This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Big Data Soc.

自引率

0.00%

发文量