{"title":"众包系统中有限监督下的工人过滤","authors":"Lingyu Lyu, M. Kantardzic, Hanqing Hu","doi":"10.1109/ICMLA.2018.00128","DOIUrl":null,"url":null,"abstract":"In order to obtain high quality labels, it is important to recognize and tackle noisy workers in crowdsourcing applications. In particular, spam workers, who randomly assign labels to items, can greatly degrade the crowdsourced label quality. As such, we propose a semi-supervised worker filtering (SWF) approach to filter this type of workers among the crowd. The SWF model recognizes spam workers by utilizing a limited set of gold truths. An optimization based truth discovery framework, which minimizes the total errors reside workers' labels, is integrated with the semi-supervised worker filtering approach (SWF-TD) to infer the true labels for unlabeled items. The efficacy of the proposed methodology is demonstrated on both synthetic and real-world datasets. The experimental analysis on real world datasets showed that by using around 40% gold truths as priori knowledge, it is possible that SWF-TD approach provides similar performance to the fully labeled worker filtering model.","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"2 1","pages":"802-807"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Worker Filtering with Limited Supervision in Crowdsourcing Systems\",\"authors\":\"Lingyu Lyu, M. Kantardzic, Hanqing Hu\",\"doi\":\"10.1109/ICMLA.2018.00128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to obtain high quality labels, it is important to recognize and tackle noisy workers in crowdsourcing applications. In particular, spam workers, who randomly assign labels to items, can greatly degrade the crowdsourced label quality. As such, we propose a semi-supervised worker filtering (SWF) approach to filter this type of workers among the crowd. The SWF model recognizes spam workers by utilizing a limited set of gold truths. An optimization based truth discovery framework, which minimizes the total errors reside workers' labels, is integrated with the semi-supervised worker filtering approach (SWF-TD) to infer the true labels for unlabeled items. The efficacy of the proposed methodology is demonstrated on both synthetic and real-world datasets. The experimental analysis on real world datasets showed that by using around 40% gold truths as priori knowledge, it is possible that SWF-TD approach provides similar performance to the fully labeled worker filtering model.\",\"PeriodicalId\":6533,\"journal\":{\"name\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"2 1\",\"pages\":\"802-807\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2018.00128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Worker Filtering with Limited Supervision in Crowdsourcing Systems
In order to obtain high quality labels, it is important to recognize and tackle noisy workers in crowdsourcing applications. In particular, spam workers, who randomly assign labels to items, can greatly degrade the crowdsourced label quality. As such, we propose a semi-supervised worker filtering (SWF) approach to filter this type of workers among the crowd. The SWF model recognizes spam workers by utilizing a limited set of gold truths. An optimization based truth discovery framework, which minimizes the total errors reside workers' labels, is integrated with the semi-supervised worker filtering approach (SWF-TD) to infer the true labels for unlabeled items. The efficacy of the proposed methodology is demonstrated on both synthetic and real-world datasets. The experimental analysis on real world datasets showed that by using around 40% gold truths as priori knowledge, it is possible that SWF-TD approach provides similar performance to the fully labeled worker filtering model.