输入数据复杂性对众包质量的影响

Proceedings of the 25th International Conference on Intelligent User Interfaces Companion Pub Date : 2020-03-17 DOI:10.1145/3379336.3381499

Christopher Tauchmann, Johannes Daxenberger, Margot Mieskes

{"title":"输入数据复杂性对众包质量的影响","authors":"Christopher Tauchmann, Johannes Daxenberger, Margot Mieskes","doi":"10.1145/3379336.3381499","DOIUrl":null,"url":null,"abstract":"Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.","PeriodicalId":335081,"journal":{"name":"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"The Influence of Input Data Complexity on Crowdsourcing Quality\",\"authors\":\"Christopher Tauchmann, Johannes Daxenberger, Margot Mieskes\",\"doi\":\"10.1145/3379336.3381499\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.\",\"PeriodicalId\":335081,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3379336.3381499\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3379336.3381499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

众包对NLP任务的数据收集产生了巨大的影响。然而，大多数质量控制措施依赖于数据汇总方法，这些方法只在众包过程之后使用，因此无法在数据收集过程中处理不同的工人资格。这既耗时又低成本，因为有些数据点可能不得不重新标记或丢弃。事先对工人进行培训并根据工人资格分配工作有助于克服这一限制。我们提出了一个考虑输入数据复杂性的设置，并且只允许一组成功完成复杂性不断上升的任务的工作人员继续在更困难的子集上工作。这样，我们既可以培训工人，同时又可以排除不合格的工人。在最初的实验中，我们的方法与同一数据集上来自随机人群工作者的五个注释相比，获得了更高的一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Influence of Input Data Complexity on Crowdsourcing Quality

Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th International Conference on Intelligent User Interfaces Companion

自引率

0.00%

发文量