众包系统中有限监督下的工人过滤

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2018-12-01 DOI:10.1109/ICMLA.2018.00128

Lingyu Lyu, M. Kantardzic, Hanqing Hu

{"title":"众包系统中有限监督下的工人过滤","authors":"Lingyu Lyu, M. Kantardzic, Hanqing Hu","doi":"10.1109/ICMLA.2018.00128","DOIUrl":null,"url":null,"abstract":"In order to obtain high quality labels, it is important to recognize and tackle noisy workers in crowdsourcing applications. In particular, spam workers, who randomly assign labels to items, can greatly degrade the crowdsourced label quality. As such, we propose a semi-supervised worker filtering (SWF) approach to filter this type of workers among the crowd. The SWF model recognizes spam workers by utilizing a limited set of gold truths. An optimization based truth discovery framework, which minimizes the total errors reside workers' labels, is integrated with the semi-supervised worker filtering approach (SWF-TD) to infer the true labels for unlabeled items. The efficacy of the proposed methodology is demonstrated on both synthetic and real-world datasets. The experimental analysis on real world datasets showed that by using around 40% gold truths as priori knowledge, it is possible that SWF-TD approach provides similar performance to the fully labeled worker filtering model.","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"2 1","pages":"802-807"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Worker Filtering with Limited Supervision in Crowdsourcing Systems\",\"authors\":\"Lingyu Lyu, M. Kantardzic, Hanqing Hu\",\"doi\":\"10.1109/ICMLA.2018.00128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to obtain high quality labels, it is important to recognize and tackle noisy workers in crowdsourcing applications. In particular, spam workers, who randomly assign labels to items, can greatly degrade the crowdsourced label quality. As such, we propose a semi-supervised worker filtering (SWF) approach to filter this type of workers among the crowd. The SWF model recognizes spam workers by utilizing a limited set of gold truths. An optimization based truth discovery framework, which minimizes the total errors reside workers' labels, is integrated with the semi-supervised worker filtering approach (SWF-TD) to infer the true labels for unlabeled items. The efficacy of the proposed methodology is demonstrated on both synthetic and real-world datasets. The experimental analysis on real world datasets showed that by using around 40% gold truths as priori knowledge, it is possible that SWF-TD approach provides similar performance to the fully labeled worker filtering model.\",\"PeriodicalId\":6533,\"journal\":{\"name\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"2 1\",\"pages\":\"802-807\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2018.00128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为了获得高质量的标签，在众包应用中识别和解决噪音工人是很重要的。特别是垃圾邮件工作者，他们随机地给物品分配标签，这大大降低了众包标签的质量。因此，我们提出了一种半监督工人过滤(SWF)方法来过滤人群中的这类工人。SWF模型通过使用一组有限的黄金真理来识别垃圾邮件工作者。将基于优化的真相发现框架与半监督工人过滤方法(SWF-TD)相结合，以最大限度地减少工人标签的总误差，从而推断出未标记项目的真实标签。所提出的方法的有效性在合成和现实世界的数据集上得到了证明。对真实世界数据集的实验分析表明，通过使用大约40%的黄金真理作为先验知识，SWF-TD方法有可能提供与完全标记的工人过滤模型相似的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Worker Filtering with Limited Supervision in Crowdsourcing Systems

In order to obtain high quality labels, it is important to recognize and tackle noisy workers in crowdsourcing applications. In particular, spam workers, who randomly assign labels to items, can greatly degrade the crowdsourced label quality. As such, we propose a semi-supervised worker filtering (SWF) approach to filter this type of workers among the crowd. The SWF model recognizes spam workers by utilizing a limited set of gold truths. An optimization based truth discovery framework, which minimizes the total errors reside workers' labels, is integrated with the semi-supervised worker filtering approach (SWF-TD) to infer the true labels for unlabeled items. The efficacy of the proposed methodology is demonstrated on both synthetic and real-world datasets. The experimental analysis on real world datasets showed that by using around 40% gold truths as priori knowledge, it is possible that SWF-TD approach provides similar performance to the fully labeled worker filtering model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量