{"title":"众包的半监督多数投票","authors":"Hao Yu , Shichao Zhang , Jiaye Li , Chengqing Li","doi":"10.1016/j.inffus.2025.103412","DOIUrl":null,"url":null,"abstract":"<div><div>Crowdsourced datasets often suffer from missing labels, significantly degrading classifier performance. In this paper, we propose Semi-Supervised Majority Voting (SSMV), a novel framework that integrates semi-supervised learning into the aggregation process to mitigate these effects. First, SSMV partitions the crowdsourcing label matrix into a “sparse” region (with many missing entries) and a “dense” region (with mostly observed labels), yielding two complementary sample sets. Next, it jointly learns a reconstruction coefficient matrix — regularized by an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>1</mn></mrow></msub></math></span>-norm to suppress noise and redundancy — by minimizing the discrepancy between the original and reconstructed label matrices. A graph-based Laplacian term preserves the intrinsic manifold structure during reconstruction, while a learned worker-selection vector filters out low-quality annotators. Finally, we apply classic majority voting to the refined label matrix to infer final labels. Extensive experiments on synthetic and real-world datasets demonstrate that SSMV consistently outperforms state-of-the-art crowdsourcing classifiers across multiple metrics. By explicitly modeling the relationship between missing-label patterns and overall label distributions, SSMV not only recovers missing labels more accurately but also enhances overall classification accuracy. This semi-supervised mechanism is readily extensible to other aggregation algorithms, providing a general strategy for enhancing crowdsourced label quality.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103412"},"PeriodicalIF":14.7000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-Supervised Majority Voting for crowdsourcing\",\"authors\":\"Hao Yu , Shichao Zhang , Jiaye Li , Chengqing Li\",\"doi\":\"10.1016/j.inffus.2025.103412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Crowdsourced datasets often suffer from missing labels, significantly degrading classifier performance. In this paper, we propose Semi-Supervised Majority Voting (SSMV), a novel framework that integrates semi-supervised learning into the aggregation process to mitigate these effects. First, SSMV partitions the crowdsourcing label matrix into a “sparse” region (with many missing entries) and a “dense” region (with mostly observed labels), yielding two complementary sample sets. Next, it jointly learns a reconstruction coefficient matrix — regularized by an <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>1</mn></mrow></msub></math></span>-norm to suppress noise and redundancy — by minimizing the discrepancy between the original and reconstructed label matrices. A graph-based Laplacian term preserves the intrinsic manifold structure during reconstruction, while a learned worker-selection vector filters out low-quality annotators. Finally, we apply classic majority voting to the refined label matrix to infer final labels. Extensive experiments on synthetic and real-world datasets demonstrate that SSMV consistently outperforms state-of-the-art crowdsourcing classifiers across multiple metrics. By explicitly modeling the relationship between missing-label patterns and overall label distributions, SSMV not only recovers missing labels more accurately but also enhances overall classification accuracy. This semi-supervised mechanism is readily extensible to other aggregation algorithms, providing a general strategy for enhancing crowdsourced label quality.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"125 \",\"pages\":\"Article 103412\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525004853\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004853","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Crowdsourced datasets often suffer from missing labels, significantly degrading classifier performance. In this paper, we propose Semi-Supervised Majority Voting (SSMV), a novel framework that integrates semi-supervised learning into the aggregation process to mitigate these effects. First, SSMV partitions the crowdsourcing label matrix into a “sparse” region (with many missing entries) and a “dense” region (with mostly observed labels), yielding two complementary sample sets. Next, it jointly learns a reconstruction coefficient matrix — regularized by an -norm to suppress noise and redundancy — by minimizing the discrepancy between the original and reconstructed label matrices. A graph-based Laplacian term preserves the intrinsic manifold structure during reconstruction, while a learned worker-selection vector filters out low-quality annotators. Finally, we apply classic majority voting to the refined label matrix to infer final labels. Extensive experiments on synthetic and real-world datasets demonstrate that SSMV consistently outperforms state-of-the-art crowdsourcing classifiers across multiple metrics. By explicitly modeling the relationship between missing-label patterns and overall label distributions, SSMV not only recovers missing labels more accurately but also enhances overall classification accuracy. This semi-supervised mechanism is readily extensible to other aggregation algorithms, providing a general strategy for enhancing crowdsourced label quality.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.