输入数据复杂性对众包质量的影响

Christopher Tauchmann, Johannes Daxenberger, Margot Mieskes
{"title":"输入数据复杂性对众包质量的影响","authors":"Christopher Tauchmann, Johannes Daxenberger, Margot Mieskes","doi":"10.1145/3379336.3381499","DOIUrl":null,"url":null,"abstract":"Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.","PeriodicalId":335081,"journal":{"name":"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"The Influence of Input Data Complexity on Crowdsourcing Quality\",\"authors\":\"Christopher Tauchmann, Johannes Daxenberger, Margot Mieskes\",\"doi\":\"10.1145/3379336.3381499\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.\",\"PeriodicalId\":335081,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3379336.3381499\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Intelligent User Interfaces Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3379336.3381499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

众包对NLP任务的数据收集产生了巨大的影响。然而,大多数质量控制措施依赖于数据汇总方法,这些方法只在众包过程之后使用,因此无法在数据收集过程中处理不同的工人资格。这既耗时又低成本,因为有些数据点可能不得不重新标记或丢弃。事先对工人进行培训并根据工人资格分配工作有助于克服这一限制。我们提出了一个考虑输入数据复杂性的设置,并且只允许一组成功完成复杂性不断上升的任务的工作人员继续在更困难的子集上工作。这样,我们既可以培训工人,同时又可以排除不合格的工人。在最初的实验中,我们的方法与同一数据集上来自随机人群工作者的五个注释相比,获得了更高的一致性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Influence of Input Data Complexity on Crowdsourcing Quality
Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信