End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models

2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI) Pub Date : 2020-10-01 DOI:10.1109/CogMI50398.2020.00013

Taraneh Younesian, Chi Hong, Amirmasoud Ghiassi, R. Birke, L. Chen

{"title":"End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models","authors":"Taraneh Younesian, Chi Hong, Amirmasoud Ghiassi, R. Birke, L. Chen","doi":"10.1109/CogMI50398.2020.00013","DOIUrl":null,"url":null,"abstract":"Labeling real-world datasets is time consuming but indispensable for supervised machine learning models. A common solution is to distribute the labeling task across a large number of non-expert workers via crowd-sourcing. Due to the varying background and experience of crowd workers, the obtained labels are highly prone to errors and even detrimental to the learning models. In this paper, we advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data, especially in an on-line scenario. We first summarize the state-of-the-art solutions that address the challenges of noisy labels from non-expert crowd and learn from multiple annotators. We show how label aggregation can benefit from estimating the annotators' confusion matrices to improve the learning process. Moreover, with the help of an expert labeler as well as classifiers, we cleanse aggregated labels of highly informative samples to enhance the final classification accuracy. We demonstrate the effectiveness of our strategies on several image datasets, i.e. UCI and CIFAR-10, using SVM and deep neural networks. Our evaluation shows that our on-line label aggregation with confusion matrix estimation reduces the error rate of labels by over 30%. Furthermore, relabeling only 10% of the data using the expert's results in over 90% classification accuracy with SVM.","PeriodicalId":360326,"journal":{"name":"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI50398.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Labeling real-world datasets is time consuming but indispensable for supervised machine learning models. A common solution is to distribute the labeling task across a large number of non-expert workers via crowd-sourcing. Due to the varying background and experience of crowd workers, the obtained labels are highly prone to errors and even detrimental to the learning models. In this paper, we advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data, especially in an on-line scenario. We first summarize the state-of-the-art solutions that address the challenges of noisy labels from non-expert crowd and learn from multiple annotators. We show how label aggregation can benefit from estimating the annotators' confusion matrices to improve the learning process. Moreover, with the help of an expert labeler as well as classifiers, we cleanse aggregated labels of highly informative samples to enhance the final classification accuracy. We demonstrate the effectiveness of our strategies on several image datasets, i.e. UCI and CIFAR-10, using SVM and deep neural networks. Our evaluation shows that our on-line label aggregation with confusion matrix estimation reduces the error rate of labels by over 30%. Furthermore, relabeling only 10% of the data using the expert's results in over 90% classification accuracy with SVM.

查看原文本刊更多论文

从嘈杂人群到监督机器学习模型的端到端学习

标记真实世界的数据集非常耗时，但对于有监督的机器学习模型来说是必不可少的。一个常见的解决方案是通过众包将标签任务分配给大量的非专业工人。由于群体工作者的背景和经验不同，获得的标签极易出错，甚至对学习模型不利。在本文中，我们提倡使用混合智能，即结合深度模型和人类专家，从嘈杂的众包数据中设计端到端学习框架，特别是在在线场景中。我们首先总结了解决来自非专业人群的噪声标签挑战的最先进的解决方案，并从多个注释器中学习。我们展示了标签聚合如何从估计注释者的混淆矩阵中获益，从而改进学习过程。此外，在专家标注器和分类器的帮助下，我们清洗高信息量样本的聚合标签，以提高最终的分类精度。我们使用支持向量机和深度神经网络在多个图像数据集(即UCI和CIFAR-10)上证明了我们的策略的有效性。我们的评估表明，使用混淆矩阵估计的在线标签聚合将标签的错误率降低了30%以上。此外，使用专家的结果对10%的数据进行重新标记，支持向量机的分类准确率超过90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)

自引率

0.00%

发文量