Debiased Label Aggregation for Subjective Crowdsourcing Tasks

CHI Conference on Human Factors in Computing Systems Extended Abstracts Pub Date : 2022-04-27 DOI:10.1145/3491101.3519614

S. Wallace, Tianyuan Cai, Brendan Le, Luis A Leiva

引用次数: 2

Abstract

Human Intelligence Tasks (HITs) allow people to collect and curate labeled data from multiple annotators. Then labels are often aggregated to create an annotated dataset suitable for supervised machine learning tasks. The most popular label aggregation method is majority voting, where each item in the dataset is assigned the most common label from the annotators. This approach is optimal when annotators are unbiased domain experts. In this paper, we propose Debiased Label Aggregation (DLA) an alternative method for label aggregation in subjective HITs, where cross-annotator agreement varies. DLA leverages user voting behavior patterns to weight labels. Our experiments show that DLA outperforms majority voting in several performance metrics; e.g. a percentage increase of 20 points in the F1 measure before data augmentation, and a percentage increase of 35 points in the same measure after data augmentation. Since DLA is deceptively simple, we hope it will help researchers to tackle subjective labeling tasks.

查看原文本刊更多论文

主观众包任务的无偏见标签聚合

人类智能任务(hit)允许人们收集和管理来自多个注释者的标记数据。然后通常将标签聚合以创建适合监督机器学习任务的带注释的数据集。最流行的标签聚合方法是多数投票，在这种方法中，数据集中的每个项目从注释者那里获得最常见的标签。当注释者是无偏见的领域专家时，这种方法是最佳的。在本文中，我们提出了一种用于主观HITs中标签聚合的替代方法，其中跨注释者的一致性不同。DLA利用用户投票行为模式来加权标签。我们的实验表明，DLA在几个性能指标上优于多数投票;例如，在数据增强之前，F1措施增加了20个百分点，在数据增强之后，同一措施增加了35个百分点。由于DLA看似简单，我们希望它能帮助研究人员解决主观标签任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CHI Conference on Human Factors in Computing Systems Extended Abstracts

自引率

0.00%

发文量