Worker similarity-based noise correction for crowdsourcing

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2023-11-30 DOI:10.1016/j.is.2023.102321

Yufei Hu , Liangxiao Jiang , Wenjun Zhang

{"title":"Worker similarity-based noise correction for crowdsourcing","authors":"Yufei Hu , Liangxiao Jiang , Wenjun Zhang","doi":"10.1016/j.is.2023.102321","DOIUrl":null,"url":null,"abstract":"<div><p>Crowdsourcing offers a cost-effective way to obtain multiple noisy labels for each instance by employing multiple crowd workers. Then label integration is used to infer its integrated label. Despite the effectiveness of label integration algorithms, there always remains a certain degree of noise in the integrated labels. Thus noise correction algorithms have been proposed to reduce the impact of noise. However, almost all existing noise correction algorithms only focus on individual workers but ignore the correlations among workers. In this paper, we argue that similar workers have similar annotating skills and tend to be consistent in annotating same or similar instances. Based on this premise, we propose a novel noise correction algorithm called worker similarity-based noise correction (WSNC). At first, WSNC exploits the annotating information of similar workers on similar instances to estimate the quality of each label annotated by each worker on each instance. Then, WSNC re-infers the integrated label of each instance based on the qualities of its multiple noisy labels. Finally, WSNC considers the instance whose re-inferred integrated label differs from its original integrated label as a noise instance and further corrects it. The extensive experiments on a large number of simulated and three real-world crowdsourced datasets verify the effectiveness of WSNC.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102321"},"PeriodicalIF":3.0000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437923001576","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Crowdsourcing offers a cost-effective way to obtain multiple noisy labels for each instance by employing multiple crowd workers. Then label integration is used to infer its integrated label. Despite the effectiveness of label integration algorithms, there always remains a certain degree of noise in the integrated labels. Thus noise correction algorithms have been proposed to reduce the impact of noise. However, almost all existing noise correction algorithms only focus on individual workers but ignore the correlations among workers. In this paper, we argue that similar workers have similar annotating skills and tend to be consistent in annotating same or similar instances. Based on this premise, we propose a novel noise correction algorithm called worker similarity-based noise correction (WSNC). At first, WSNC exploits the annotating information of similar workers on similar instances to estimate the quality of each label annotated by each worker on each instance. Then, WSNC re-infers the integrated label of each instance based on the qualities of its multiple noisy labels. Finally, WSNC considers the instance whose re-inferred integrated label differs from its original integrated label as a noise instance and further corrects it. The extensive experiments on a large number of simulated and three real-world crowdsourced datasets verify the effectiveness of WSNC.

查看原文本刊更多论文

基于工人相似性的众包噪声校正

众包为每个实例提供了一种成本效益高的方法，即通过雇用多名众包工作者来获取多个噪声标签。然后用标签积分法推导其集成标签。尽管标签集成算法是有效的，但在集成后的标签中仍然存在一定程度的噪声。因此，提出了噪声校正算法来降低噪声的影响。然而，几乎所有现有的噪声校正算法都只关注单个工人，而忽略了工人之间的相关性。在本文中，我们认为相似的工作者具有相似的注释技能，并且在注释相同或相似的实例时倾向于一致。在此前提下，我们提出了一种新的噪声校正算法——基于工人相似度的噪声校正(WSNC)。首先，WSNC利用相似工作人员在相似实例上的标注信息来估计每个工作人员在每个实例上标注的每个标签的质量。然后，WSNC根据每个实例的多个噪声标签的质量，重新推导出每个实例的综合标签。最后，WSNC将重新推断的集成标签与其原始集成标签不同的实例视为噪声实例，并对其进行进一步校正。在大量模拟数据集和三个真实众包数据集上的大量实验验证了WSNC的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.