Efficient crowdsourcing for multi-class labeling

Measurement and Modeling of Computer Systems Pub Date : 2013-06-01 DOI:10.1145/2465529.2465761

David R Karger, Sewoong Oh, D. Shah

{"title":"Efficient crowdsourcing for multi-class labeling","authors":"David R Karger, Sewoong Oh, D. Shah","doi":"10.1145/2465529.2465761","DOIUrl":null,"url":null,"abstract":"Crowdsourcing systems like Amazon's Mechanical Turk have emerged as an effective large-scale human-powered platform for performing tasks in domains such as image classification, data entry, recommendation, and proofreading. Since workers are low-paid (a few cents per task) and tasks performed are monotonous, the answers obtained are noisy and hence unreliable. To obtain reliable estimates, it is essential to utilize appropriate inference algorithms (e.g. Majority voting) coupled with structured redundancy through task assignment. Our goal is to obtain the best possible trade-off between reliability and redundancy. In this paper, we consider a general probabilistic model for noisy observations for crowd-sourcing systems and pose the problem of minimizing the total price (i.e. redundancy) that must be paid to achieve a target overall reliability. Concretely, we show that it is possible to obtain an answer to each task correctly with probability 1-ε as long as the redundancy per task is O((K/q) log (K/ε)), where each task can have any of the $K$ distinct answers equally likely, q is the crowd-quality parameter that is defined through a probabilistic model. Further, effectively this is the best possible redundancy-accuracy trade-off any system design can achieve. Such a single-parameter crisp characterization of the (order-)optimal trade-off between redundancy and reliability has various useful operational consequences. Further, we analyze the robustness of our approach in the presence of adversarial workers and provide a bound on their influence on the redundancy-accuracy trade-off.\n Unlike recent prior work [GKM11, KOS11, KOS11], our result applies to non-binary (i.e. K>2) tasks. In effect, we utilize algorithms for binary tasks (with inhomogeneous error model unlike that in [GKM11, KOS11, KOS11]) as key subroutine to obtain answers for K-ary tasks. Technically, the algorithm is based on low-rank approximation of weighted adjacency matrix for a random regular bipartite graph, weighted according to the answers provided by the workers.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"165","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2465529.2465761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 165

Abstract

Crowdsourcing systems like Amazon's Mechanical Turk have emerged as an effective large-scale human-powered platform for performing tasks in domains such as image classification, data entry, recommendation, and proofreading. Since workers are low-paid (a few cents per task) and tasks performed are monotonous, the answers obtained are noisy and hence unreliable. To obtain reliable estimates, it is essential to utilize appropriate inference algorithms (e.g. Majority voting) coupled with structured redundancy through task assignment. Our goal is to obtain the best possible trade-off between reliability and redundancy. In this paper, we consider a general probabilistic model for noisy observations for crowd-sourcing systems and pose the problem of minimizing the total price (i.e. redundancy) that must be paid to achieve a target overall reliability. Concretely, we show that it is possible to obtain an answer to each task correctly with probability 1-ε as long as the redundancy per task is O((K/q) log (K/ε)), where each task can have any of the $K$ distinct answers equally likely, q is the crowd-quality parameter that is defined through a probabilistic model. Further, effectively this is the best possible redundancy-accuracy trade-off any system design can achieve. Such a single-parameter crisp characterization of the (order-)optimal trade-off between redundancy and reliability has various useful operational consequences. Further, we analyze the robustness of our approach in the presence of adversarial workers and provide a bound on their influence on the redundancy-accuracy trade-off. Unlike recent prior work [GKM11, KOS11, KOS11], our result applies to non-binary (i.e. K>2) tasks. In effect, we utilize algorithms for binary tasks (with inhomogeneous error model unlike that in [GKM11, KOS11, KOS11]) as key subroutine to obtain answers for K-ary tasks. Technically, the algorithm is based on low-rank approximation of weighted adjacency matrix for a random regular bipartite graph, weighted according to the answers provided by the workers.

查看原文本刊更多论文

高效众包多类标签

像亚马逊的Mechanical Turk这样的众包系统已经成为一个有效的大规模人力平台，可以在图像分类、数据输入、推荐和校对等领域执行任务。由于工人的工资很低(每项任务几美分)，而且执行的任务单调乏味，因此得到的答案是嘈杂的，因此是不可靠的。为了获得可靠的估计，必须利用适当的推理算法(例如多数投票)以及通过任务分配的结构化冗余。我们的目标是在可靠性和冗余之间取得最好的平衡。在本文中，我们考虑了众包系统的噪声观测的一般概率模型，并提出了最小化总价格(即冗余)的问题，该问题必须支付以实现目标总体可靠性。具体地说，我们表明，只要每个任务的冗余度为O((K/q) log (K/ε))，就有可能以1-ε的概率正确地获得每个任务的答案，其中每个任务可以具有$K$个不同答案中的任何一个等可能，q是通过概率模型定义的人群质量参数。此外，这实际上是任何系统设计都可以实现的最佳冗余-准确性权衡。这种对冗余和可靠性之间(顺序)最优权衡的单参数清晰表征具有各种有用的操作结果。此外，我们分析了我们的方法在对抗工人的存在下的鲁棒性，并提供了他们对冗余-精度权衡的影响的界限。与最近的先前工作[GKM11, KOS11, KOS11]不同，我们的结果适用于非二进制(即K>2)任务。实际上，我们利用二元任务算法(与[GKM11, KOS11, KOS11]中的非均匀误差模型不同)作为关键子程序来获取K-ary任务的答案。从技术上讲，该算法基于随机规则二部图的加权邻接矩阵的低秩逼近，根据工作人员提供的答案进行加权。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Measurement and Modeling of Computer Systems

自引率

0.00%

发文量