Cost-Effective Quality Assurance in Crowd Labeling

IRPN: Innovation & Human Resource Management (Topic) Pub Date : 2016-06-03 DOI:10.1287/isre.2016.0661

Jing Wang, Panagiotis G. Ipeirotis, F. Provost

{"title":"Cost-Effective Quality Assurance in Crowd Labeling","authors":"Jing Wang, Panagiotis G. Ipeirotis, F. Provost","doi":"10.1287/isre.2016.0661","DOIUrl":null,"url":null,"abstract":"The emergence of online paid micro-crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), allows on-demand and at scale distribution of tasks to human workers around the world. In such settings, online workers come and complete small tasks posted by an employer, working for as long or as little as they wish, a process that eliminates the overhead of the hiring (and dismissal). This flexibility introduces a different set of inefficiencies: verifying the quality of every submitted piece of work is an expensive operation, which often requires the same level of effort as performing the task itself. A number of research challenges arise in such settings. How can we ensure that the submitted work is accurate? What allocation strategies can be employed to make the best use of the available labor force? How to appropriately assess the performance of individual workers? In this paper, we consider labeling tasks and develop a comprehensive scheme for managing the quality of crowd labeling: First, we present several algorithms for inferring the true classes of objects and the quality of participating workers, assuming the labels are collected all at once before the inference. Next, we allow employers to adaptively decide which object to assign to the next arriving worker and propose several heuristic-based dynamic label allocation strategies to achieve the desired data quality with significantly fewer labels. Experimental results on both simulated and real data confirm the superior performance of the proposed allocation strategies over other existing policies. Finally, we introduce two novel metrics that can be used to objectively rank the performance of crowdsourced workers, after fixing correctable worker errors and taking into account the costs of different classification errors. In particular, the worker value metric directly measures the monetary value contributed by each label of the worker towards meeting the quality requirements and may provide a basis for the design of fair and efficient compensation schemes.","PeriodicalId":432527,"journal":{"name":"IRPN: Innovation & Human Resource Management (Topic)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IRPN: Innovation & Human Resource Management (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/isre.2016.0661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

The emergence of online paid micro-crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), allows on-demand and at scale distribution of tasks to human workers around the world. In such settings, online workers come and complete small tasks posted by an employer, working for as long or as little as they wish, a process that eliminates the overhead of the hiring (and dismissal). This flexibility introduces a different set of inefficiencies: verifying the quality of every submitted piece of work is an expensive operation, which often requires the same level of effort as performing the task itself. A number of research challenges arise in such settings. How can we ensure that the submitted work is accurate? What allocation strategies can be employed to make the best use of the available labor force? How to appropriately assess the performance of individual workers? In this paper, we consider labeling tasks and develop a comprehensive scheme for managing the quality of crowd labeling: First, we present several algorithms for inferring the true classes of objects and the quality of participating workers, assuming the labels are collected all at once before the inference. Next, we allow employers to adaptively decide which object to assign to the next arriving worker and propose several heuristic-based dynamic label allocation strategies to achieve the desired data quality with significantly fewer labels. Experimental results on both simulated and real data confirm the superior performance of the proposed allocation strategies over other existing policies. Finally, we introduce two novel metrics that can be used to objectively rank the performance of crowdsourced workers, after fixing correctable worker errors and taking into account the costs of different classification errors. In particular, the worker value metric directly measures the monetary value contributed by each label of the worker towards meeting the quality requirements and may provide a basis for the design of fair and efficient compensation schemes.

查看原文本刊更多论文

人群标签的成本效益质量保证

在线付费微众包平台的出现，如亚马逊土耳其机械(AMT)，允许按需和大规模地将任务分配给世界各地的人类工人。在这种情况下，在线员工来完成雇主发布的小任务，工作时间长短取决于他们的意愿，这一过程消除了招聘(和解雇)的开销。这种灵活性带来了另一组低效率:验证每个提交的工作的质量是一项昂贵的操作，它通常需要与执行任务本身相同水平的努力。在这种情况下，出现了许多研究挑战。我们如何确保提交的工作是准确的?可以采用什么样的分配策略来最大限度地利用现有劳动力?如何恰当地评估员工个人的绩效?在本文中，我们考虑了标记任务并开发了一个管理人群标记质量的综合方案:首先，我们提出了几种用于推断对象的真实类别和参与工作人员的质量的算法，假设在推断之前一次性收集所有标签。接下来，我们允许雇主自适应地决定将哪个对象分配给下一个到达的工人，并提出了几种基于启发式的动态标签分配策略，以显着减少标签来实现所需的数据质量。在模拟和实际数据上的实验结果证实了所提出的分配策略优于其他现有策略。最后，我们引入了两个新的指标，在确定可纠正的工人错误并考虑到不同分类错误的成本之后，可以用来客观地对众包工人的绩效进行排名。特别是，工人价值指标直接衡量工人的每个标签为满足质量要求所贡献的货币价值，并可为设计公平和有效的补偿方案提供基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IRPN: Innovation & Human Resource Management (Topic)

自引率

0.00%

发文量