{"title":"基于工人相似度的众包标签完成","authors":"Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li","doi":"10.1109/TBDATA.2024.3426310","DOIUrl":null,"url":null,"abstract":"In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"710-721"},"PeriodicalIF":7.5000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Worker Similarity-Based Label Completion for Crowdsourcing\",\"authors\":\"Xue Wu;Liangxiao Jiang;Wenjun Zhang;Chaoqun Li\",\"doi\":\"10.1109/TBDATA.2024.3426310\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 2\",\"pages\":\"710-721\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10592826/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10592826/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Worker Similarity-Based Label Completion for Crowdsourcing
In real-world crowdsourcing scenarios, it is a common phenomenon that each worker only annotates a few instances, resulting in a significantly sparse crowdsourcing label matrix. Consequently, only a small number of workers influence the inferred integrated label of each instance, which may weaken the performance of label integration algorithms. To address this problem, we propose a novel label completion algorithm called Worker Similarity-based Label Completion (WSLC). WSLC is grounded on the assumption that workers with similar cognitive abilities will annotate similar labels on the same instances. Specifically, we first construct a data set for each worker that includes all instances annotated by this worker and learn a feature vector for each worker. Then, we define a metric based on cosine similarity to estimate worker similarity based on the learned feature vectors. Finally, we complete the labels for each worker on unannotated instances based on the worker similarity and the annotations of similar workers. The experimental results on one real-world and 34 simulated crowdsourced data sets consistently show that WSLC effectively addresses the problem of the sparse crowdsourcing label matrix and enhances the integration accuracies of label integration algorithms.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.