Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models

J. Data Intell. Pub Date : 2022-08-01 DOI:10.26421/jdi3.3-1

Ayame Shimizu, Kei Wakabayashi

{"title":"Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models","authors":"Ayame Shimizu, Kei Wakabayashi","doi":"10.26421/jdi3.3-1","DOIUrl":null,"url":null,"abstract":"Crowdsourcing is widely utilized for collecting labeled examples to train supervised machine learning models, but the labels obtained from workers are considerably noisier than those from expert annotators. To address the noisy label issue, most researchers adopt the repeated labeling strategy, where multiple (redundant) labels are collected for each example and then aggregated. Although this improves the annotation quality, it decreases the amount of training data when the budget for crowdsourcing is limited, which is a negative factor in terms of the accuracy of the machine learning model to be trained. This paper empirically examines the extent to which repeated labeling contributes to the accuracy of machine learning models for image classification, named entity recognition and sentiment analysis under various conditions of budget and worker quality. We experimentally examined four hypotheses related to the effect of budget, worker quality, task difficulty, and redundancy on crowdsourcing. The results on image classification and named entity recognition supported all four hypotheses and suggested that repeated labeling almost always has a negative impact on machine learning when it comes to accuracy. Somewhat surprisingly, the results on sentiment analysis using pretrained models did not support the hypothesis which shows the possibility of remaining utilization of multiple-labeling.","PeriodicalId":232625,"journal":{"name":"J. Data Intell.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Data Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26421/jdi3.3-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Crowdsourcing is widely utilized for collecting labeled examples to train supervised machine learning models, but the labels obtained from workers are considerably noisier than those from expert annotators. To address the noisy label issue, most researchers adopt the repeated labeling strategy, where multiple (redundant) labels are collected for each example and then aggregated. Although this improves the annotation quality, it decreases the amount of training data when the budget for crowdsourcing is limited, which is a negative factor in terms of the accuracy of the machine learning model to be trained. This paper empirically examines the extent to which repeated labeling contributes to the accuracy of machine learning models for image classification, named entity recognition and sentiment analysis under various conditions of budget and worker quality. We experimentally examined four hypotheses related to the effect of budget, worker quality, task difficulty, and redundancy on crowdsourcing. The results on image classification and named entity recognition supported all four hypotheses and suggested that repeated labeling almost always has a negative impact on machine learning when it comes to accuracy. Somewhat surprisingly, the results on sentiment analysis using pretrained models did not support the hypothesis which shows the possibility of remaining utilization of multiple-labeling.

查看原文本刊更多论文

标签冗余在众包训练机器学习模型中的影响

众包被广泛用于收集有标签的例子来训练有监督的机器学习模型，但是从工人那里获得的标签比从专家注释者那里获得的标签要嘈杂得多。为了解决噪声标签问题，大多数研究人员采用重复标签策略，即为每个示例收集多个(冗余)标签，然后进行聚合。虽然这提高了标注质量，但在众包预算有限的情况下，它减少了训练数据的数量，这对于待训练的机器学习模型的准确性来说是一个负面因素。本文通过实证检验了在各种预算和工人素质条件下，重复标记对机器学习模型用于图像分类、命名实体识别和情感分析的准确性的贡献程度。我们通过实验检验了与预算、员工素质、任务难度和冗余对众包的影响有关的四个假设。图像分类和命名实体识别的结果支持所有四个假设，并表明在准确性方面，重复标记几乎总是对机器学习产生负面影响。有些令人惊讶的是，使用预训练模型的情绪分析结果不支持假设，这表明多重标签的剩余利用的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Data Intell.

自引率

0.00%

发文量