Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction

Isak Engdahl
{"title":"Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction","authors":"Isak Engdahl","doi":"10.1177/20539517241242457","DOIUrl":null,"url":null,"abstract":"This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"953 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Soc.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/20539517241242457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.
野生 "协议:机器学习基准数据集构建中的标准与协调
本文通过人种学案例研究,介绍了一个企业-学术小组为各种机器学习和计算机视觉任务构建日常活动基准数据集的情况。文章采用社会技术视角,将数据集概念化为一种知识对象,它通过实用标准(日常活动、数据化、注释和基准)和协调工作(即为使这些标准在实践中有效而达成一致的努力)得到稳定。通过关注协调工作,文章强调了非正式的、沟通的和支持性的努力,这些努力是标准取得成功的基础,也是缓和参与者和因素之间紧张关系的基础。强调这些努力在多个方面做出了贡献。本文的人种学分析模式是对数据集量化指标的挑战和补充。它通过对作为集体成就的新基准数据集的开发进行详细的实证研究,推动了数据集分析领域的发展。通过展示对齐工作的重要性及其与标准的紧密联系和局限性,它加深了我们对机器学习数据集如何构建的理解。最重要的是,它对数据集的一个关键特征提出了质疑:数据集捕捉的是 "野外 "自然发生的无脚本活动,因为对齐工作会渗入数据捕捉的瞬间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信