Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016) Pub Date : 2016-08-01 DOI:10.18653/v1/W16-1713

K. Kuthy, Ramon Ziai, Walt Detmar Meurers

{"title":"Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation","authors":"K. Kuthy, Ramon Ziai, Walt Detmar Meurers","doi":"10.18653/v1/W16-1713","DOIUrl":null,"url":null,"abstract":"We explore the annotation of information structure in German and compare the quality of expert annotation with crowdsourced annotation taking into account the cost of reaching crowd consensus. Concretely, we discuss a crowd-sourcing effort annotating focus in a task-based corpus of German containing reading comprehension questions and answers. Against the backdrop of a gold standard reference resulting from adjudicated expert annotation, we evaluate a crowd sourcing experiment using majority voting to determine a baseline performance. To refine the crowd-sourcing setup, we introduce the Consensus Cost as a measure of agreement within the crowd. We investigate the usefulness of Consensus Cost as a measure of crowd annotation quality both intrinsically, in relation to the expert gold standard, and extrinsically, by integrating focus annotation information into a system performing Short Answer Assessment taking into account the Consensus Cost. We find that low Consensus Cost in crowd sourcing indicates high quality, though high cost does not necessarily indicate low accuracy but increased variability. Overall, taking Consensus Cost into account improves both intrinsic and extrinsic evaluation measures.","PeriodicalId":150065,"journal":{"name":"Proceedings of the 10th Linguistic Annotation Workshop held in\n conjunction with ACL 2016 (LAW-X 2016)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th Linguistic Annotation Workshop held in\n conjunction with ACL 2016 (LAW-X 2016)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W16-1713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

We explore the annotation of information structure in German and compare the quality of expert annotation with crowdsourced annotation taking into account the cost of reaching crowd consensus. Concretely, we discuss a crowd-sourcing effort annotating focus in a task-based corpus of German containing reading comprehension questions and answers. Against the backdrop of a gold standard reference resulting from adjudicated expert annotation, we evaluate a crowd sourcing experiment using majority voting to determine a baseline performance. To refine the crowd-sourcing setup, we introduce the Consensus Cost as a measure of agreement within the crowd. We investigate the usefulness of Consensus Cost as a measure of crowd annotation quality both intrinsically, in relation to the expert gold standard, and extrinsically, by integrating focus annotation information into a system performing Short Answer Assessment taking into account the Consensus Cost. We find that low Consensus Cost in crowd sourcing indicates high quality, though high cost does not necessarily indicate low accuracy but increased variability. Overall, taking Consensus Cost into account improves both intrinsic and extrinsic evaluation measures.

查看原文本刊更多论文

基于任务的数据焦点标注:群体标注质量的建立

我们探索了德语信息结构的标注，并将专家标注与众包标注的质量进行了比较，同时考虑了达成群体共识的成本。具体地说，我们讨论了一个基于任务的德语语料库中包含阅读理解问题和答案的注释焦点的众包工作。在经过评审的专家注释产生的金标准参考的背景下，我们使用多数投票来评估一个众包实验，以确定基线性能。为了完善众包机制，我们引入共识成本(Consensus Cost)作为衡量群体内部协议的标准。我们通过将焦点注释信息集成到执行简答评估的系统中，考虑到共识成本，研究了共识成本作为衡量人群注释质量的有用性，这两个方面本质上是与专家黄金标准相关的，而在外部则是与共识成本相关的。我们发现，在众包中，低共识成本意味着高质量，尽管高成本不一定意味着低准确性，但会增加可变性。总的来说，考虑共识成本可以改善内在和外在的评价措施。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

自引率

0.00%

发文量