End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models

Taraneh Younesian, Chi Hong, Amirmasoud Ghiassi, R. Birke, L. Chen
{"title":"End-to-End Learning from Noisy Crowd to Supervised Machine Learning Models","authors":"Taraneh Younesian, Chi Hong, Amirmasoud Ghiassi, R. Birke, L. Chen","doi":"10.1109/CogMI50398.2020.00013","DOIUrl":null,"url":null,"abstract":"Labeling real-world datasets is time consuming but indispensable for supervised machine learning models. A common solution is to distribute the labeling task across a large number of non-expert workers via crowd-sourcing. Due to the varying background and experience of crowd workers, the obtained labels are highly prone to errors and even detrimental to the learning models. In this paper, we advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data, especially in an on-line scenario. We first summarize the state-of-the-art solutions that address the challenges of noisy labels from non-expert crowd and learn from multiple annotators. We show how label aggregation can benefit from estimating the annotators' confusion matrices to improve the learning process. Moreover, with the help of an expert labeler as well as classifiers, we cleanse aggregated labels of highly informative samples to enhance the final classification accuracy. We demonstrate the effectiveness of our strategies on several image datasets, i.e. UCI and CIFAR-10, using SVM and deep neural networks. Our evaluation shows that our on-line label aggregation with confusion matrix estimation reduces the error rate of labels by over 30%. Furthermore, relabeling only 10% of the data using the expert's results in over 90% classification accuracy with SVM.","PeriodicalId":360326,"journal":{"name":"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Second International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI50398.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Labeling real-world datasets is time consuming but indispensable for supervised machine learning models. A common solution is to distribute the labeling task across a large number of non-expert workers via crowd-sourcing. Due to the varying background and experience of crowd workers, the obtained labels are highly prone to errors and even detrimental to the learning models. In this paper, we advocate using hybrid intelligence, i.e., combining deep models and human experts, to design an end-to-end learning framework from noisy crowd-sourced data, especially in an on-line scenario. We first summarize the state-of-the-art solutions that address the challenges of noisy labels from non-expert crowd and learn from multiple annotators. We show how label aggregation can benefit from estimating the annotators' confusion matrices to improve the learning process. Moreover, with the help of an expert labeler as well as classifiers, we cleanse aggregated labels of highly informative samples to enhance the final classification accuracy. We demonstrate the effectiveness of our strategies on several image datasets, i.e. UCI and CIFAR-10, using SVM and deep neural networks. Our evaluation shows that our on-line label aggregation with confusion matrix estimation reduces the error rate of labels by over 30%. Furthermore, relabeling only 10% of the data using the expert's results in over 90% classification accuracy with SVM.
从嘈杂人群到监督机器学习模型的端到端学习
标记真实世界的数据集非常耗时,但对于有监督的机器学习模型来说是必不可少的。一个常见的解决方案是通过众包将标签任务分配给大量的非专业工人。由于群体工作者的背景和经验不同,获得的标签极易出错,甚至对学习模型不利。在本文中,我们提倡使用混合智能,即结合深度模型和人类专家,从嘈杂的众包数据中设计端到端学习框架,特别是在在线场景中。我们首先总结了解决来自非专业人群的噪声标签挑战的最先进的解决方案,并从多个注释器中学习。我们展示了标签聚合如何从估计注释者的混淆矩阵中获益,从而改进学习过程。此外,在专家标注器和分类器的帮助下,我们清洗高信息量样本的聚合标签,以提高最终的分类精度。我们使用支持向量机和深度神经网络在多个图像数据集(即UCI和CIFAR-10)上证明了我们的策略的有效性。我们的评估表明,使用混淆矩阵估计的在线标签聚合将标签的错误率降低了30%以上。此外,使用专家的结果对10%的数据进行重新标记,支持向量机的分类准确率超过90%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信