Worker Similarity-Based Label Completion for Crowdsourcing

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2024-07-10 DOI:10.1109/TBDATA.2024.3426310

Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li

{"title":"Worker Similarity-Based Label Completion for Crowdsourcing","authors":"Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li","doi":"10.1109/TBDATA.2024.3426310","DOIUrl":null,"url":null,"abstract":"In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"710-721"},"PeriodicalIF":5.7000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10592826/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.

查看原文本刊更多论文

基于工人相似度的众包标签完成

在现实世界的众包场景中，每个工作人员只注释几个实例是一个常见的现象，导致一个非常稀疏的众包标签矩阵。因此，只有少数工作人员影响每个实例的推断集成标签，这可能会削弱标签集成算法的性能。为了解决这个问题，我们提出了一种新的标签补全算法，称为Worker Similarity-based label completion （WSLC）。WSLC基于这样的假设：具有相似认知能力的工作人员将在相同的实例上注释相似的标签。具体来说，我们首先为每个worker构建一个数据集，其中包括该worker注释的所有实例，并为每个worker学习一个特征向量。然后，我们定义了一个基于余弦相似度的度量来估计基于学习到的特征向量的工人相似度。最后，我们根据工作人员的相似性和类似工作人员的注释完成了未注释实例上每个工作人员的标签。在1个真实数据集和34个模拟众包数据集上的实验结果一致表明，WSLC有效地解决了众包标签矩阵稀疏的问题，提高了标签集成算法的集成精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.