Rethinking Quality Assurance for Crowdsourced Multi-ROI Image Segmentation

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing Pub Date : 2023-11-03 DOI:10.1609/hcomp.v11i1.27552

Xiaolu Lu, David Ratcliffe, Tsu-Ting Kao, Aristarkh Tikhonov, Lester Litchfield, Craig Rodger, Kaier Wang

{"title":"Rethinking Quality Assurance for Crowdsourced Multi-ROI Image Segmentation","authors":"Xiaolu Lu, David Ratcliffe, Tsu-Ting Kao, Aristarkh Tikhonov, Lester Litchfield, Craig Rodger, Kaier Wang","doi":"10.1609/hcomp.v11i1.27552","DOIUrl":null,"url":null,"abstract":"Collecting high quality annotations to construct an evaluation dataset is essential for assessing the true performance of machine learning models. One popular way of performing data annotation is via crowdsourcing, where quality can be of concern. Despite much prior work addressing the annotation quality problem in crowdsourcing generally, little has been discussed in detail for image segmentation tasks. These tasks often require pixel-level annotation accuracy, and is relatively complex when compared to image classification or object detection with bounding-boxes. In this paper, we focus on image segmentation annotation via crowdsourcing, where images may not have been collected in a controlled way. In this setting, the task of annotating may be non-trivial, where annotators may experience difficultly in differentiating between regions-of-interest (ROIs) and background pixels. We implement an annotation process and examine the effectiveness of several in-situ and manual quality assurance and quality control mechanisms. We implement an annotation process on a medical image annotation task and examine the effectiveness of several in-situ and manual quality assurance and quality control mechanisms. Our observations on this task are three-fold. Firstly, including an onboarding and a pilot phase improves quality assurance as annotators can familiarize themselves with the task, especially when the definition of ROIs is ambiguous. Secondly, we observe high variability of annotation times, leading us to believe it cannot be relied upon as a source of information for quality control. When performing agreement analysis, we also show that global-level inter-rater agreement is insufficient to provide useful information, especially when annotator skill levels vary. Thirdly, we recognize that reviewing all annotations can be time-consuming and often infeasible, and there currently exist no mechanisms to reduce the workload for reviewers. Therefore, we propose a method to create a priority list of images for review based on inter-rater agreement. Our experiments suggest that this method can be used to improve reviewer efficiency when compared to a baseline approach, especially if a fixed work budget is required.","PeriodicalId":87339,"journal":{"name":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/hcomp.v11i1.27552","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Collecting high quality annotations to construct an evaluation dataset is essential for assessing the true performance of machine learning models. One popular way of performing data annotation is via crowdsourcing, where quality can be of concern. Despite much prior work addressing the annotation quality problem in crowdsourcing generally, little has been discussed in detail for image segmentation tasks. These tasks often require pixel-level annotation accuracy, and is relatively complex when compared to image classification or object detection with bounding-boxes. In this paper, we focus on image segmentation annotation via crowdsourcing, where images may not have been collected in a controlled way. In this setting, the task of annotating may be non-trivial, where annotators may experience difficultly in differentiating between regions-of-interest (ROIs) and background pixels. We implement an annotation process and examine the effectiveness of several in-situ and manual quality assurance and quality control mechanisms. We implement an annotation process on a medical image annotation task and examine the effectiveness of several in-situ and manual quality assurance and quality control mechanisms. Our observations on this task are three-fold. Firstly, including an onboarding and a pilot phase improves quality assurance as annotators can familiarize themselves with the task, especially when the definition of ROIs is ambiguous. Secondly, we observe high variability of annotation times, leading us to believe it cannot be relied upon as a source of information for quality control. When performing agreement analysis, we also show that global-level inter-rater agreement is insufficient to provide useful information, especially when annotator skill levels vary. Thirdly, we recognize that reviewing all annotations can be time-consuming and often infeasible, and there currently exist no mechanisms to reduce the workload for reviewers. Therefore, we propose a method to create a priority list of images for review based on inter-rater agreement. Our experiments suggest that this method can be used to improve reviewer efficiency when compared to a baseline approach, especially if a fixed work budget is required.

查看原文本刊更多论文

对众包多roi图像分割质量保证的再思考

收集高质量的注释来构建评估数据集对于评估机器学习模型的真实性能至关重要。执行数据注释的一种流行方式是通过众包，在这种方式下，质量可能会受到关注。尽管之前有很多工作解决了众包中的注释质量问题，但很少有关于图像分割任务的详细讨论。这些任务通常需要像素级的标注精度，与图像分类或使用边界框的对象检测相比，这些任务相对复杂。在本文中，我们关注通过众包的方式进行图像分割标注，其中图像可能没有以受控的方式收集。在这种情况下，注释任务可能很重要，注释者在区分感兴趣区域(roi)和背景像素时可能会遇到困难。我们实施了一个注释过程，并检查了几个现场和人工质量保证和质量控制机制的有效性。我们在医学图像标注任务中实现了一个标注过程，并检验了几种原位和人工质量保证和质量控制机制的有效性。我们对这项任务的看法有三点。首先，包括入职和试验阶段可以提高质量保证，因为注释者可以熟悉任务，特别是当roi的定义不明确时。其次，我们观察到注释时间的高度可变性，导致我们认为它不能作为质量控制的信息来源。在执行协议分析时，我们还表明，全球级别的评价者之间的协议不足以提供有用的信息，特别是当评价者的技能水平不同时。第三，我们认识到审查所有注释是非常耗时的，而且通常是不可行的，而且目前还没有机制来减少审稿人的工作量。因此，我们提出了一种方法来创建一个优先级列表的图像审查基于内部协议。我们的实验表明，与基线方法相比，这种方法可以用来提高审稿人的效率，特别是在需要固定工作预算的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... AAAI Conference on Human Computation and Crowdsourcing

自引率

0.00%

发文量