Worker Similarity-Based Label Completion for Crowdsourcing

IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li
{"title":"Worker Similarity-Based Label Completion for Crowdsourcing","authors":"Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li","doi":"10.1109/TBDATA.2024.3426310","DOIUrl":null,"url":null,"abstract":"In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"710-721"},"PeriodicalIF":7.5000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10592826/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.
基于工人相似度的众包标签完成
在现实世界的众包场景中,每个工作人员只注释几个实例是一个常见的现象,导致一个非常稀疏的众包标签矩阵。因此,只有少数工作人员影响每个实例的推断集成标签,这可能会削弱标签集成算法的性能。为了解决这个问题,我们提出了一种新的标签补全算法,称为Worker Similarity-based label completion (WSLC)。WSLC基于这样的假设:具有相似认知能力的工作人员将在相同的实例上注释相似的标签。具体来说,我们首先为每个worker构建一个数据集,其中包括该worker注释的所有实例,并为每个worker学习一个特征向量。然后,我们定义了一个基于余弦相似度的度量来估计基于学习到的特征向量的工人相似度。最后,我们根据工作人员的相似性和类似工作人员的注释完成了未注释实例上每个工作人员的标签。在1个真实数据集和34个模拟众包数据集上的实验结果一致表明,WSLC有效地解决了众包标签矩阵稀疏的问题,提高了标签集成算法的集成精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.80
自引率
2.80%
发文量
114
期刊介绍: The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信