Learning From Crowdsourced Noisy Labels: A signal processing perspective

IF 9.6 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Shahana Ibrahim;Panagiotis A. Traganitis;Xiao Fu;Georgios B. Giannakis
{"title":"Learning From Crowdsourced Noisy Labels: A signal processing perspective","authors":"Shahana Ibrahim;Panagiotis A. Traganitis;Xiao Fu;Georgios B. Giannakis","doi":"10.1109/MSP.2025.3572636","DOIUrl":null,"url":null,"abstract":"One of the primary catalysts fueling advances in <italic>artificial intelligence</i> (AI) and <italic>machine learning</i> (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotatorproduced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learningbased approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing—showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).","PeriodicalId":13246,"journal":{"name":"IEEE Signal Processing Magazine","volume":"42 3","pages":"84-106"},"PeriodicalIF":9.6000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Magazine","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11164541/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

One of the primary catalysts fueling advances in artificial intelligence (AI) and machine learning (ML) is the availability of massive, curated datasets. A commonly used technique to curate such massive datasets is crowdsourcing, where data are dispatched to multiple annotators. The annotatorproduced labels are then fused to serve downstream learning and inference tasks. This annotation process often creates noisy labels due to various reasons, such as the limited expertise, or unreliability of annotators, among others. Therefore, a core objective in crowdsourcing is to develop methods that effectively mitigate the negative impact of such label noise on learning tasks. This feature article introduces advances in learning from noisy crowdsourced labels. The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learningbased approaches, emphasizing analytical insights and algorithmic developments. In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization, and novel, principled solutions of longstanding challenges in crowdsourcing—showing how SP perspectives drive the advancements of this field. Furthermore, this article touches upon emerging topics that are critical for developing cutting-edge AI/ML systems, such as crowdsourcing in reinforcement learning with human feedback (RLHF) and direct preference optimization (DPO) that are key techniques for fine-tuning large language models (LLMs).
从众包噪声标签学习:信号处理的视角
推动人工智能(AI)和机器学习(ML)进步的主要催化剂之一是大量精心策划的数据集的可用性。管理如此庞大的数据集的一种常用技术是众包,将数据分发给多个注释者。然后将注释器生成的标签融合到下游的学习和推理任务中。由于各种原因,例如有限的专业知识或注释者的不可靠性等,此注释过程通常会创建嘈杂的标签。因此,众包的核心目标是开发有效减轻这种标签噪声对学习任务的负面影响的方法。这篇专题文章介绍了从嘈杂的众包标签中学习的进展。重点是关键的众包模型及其方法处理,从经典的统计模型到最近的基于深度学习的方法,强调分析见解和算法的发展。特别是,本文回顾了信号处理(SP)理论和方法之间的联系,例如张量的可辨识性和非负矩阵分解,以及众包中长期挑战的新颖,原则性解决方案-展示了SP观点如何推动该领域的进步。此外,本文还涉及了对开发尖端AI/ML系统至关重要的新兴主题,例如基于人类反馈的强化学习众包(RLHF)和直接偏好优化(DPO),这些都是微调大型语言模型(llm)的关键技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Magazine
IEEE Signal Processing Magazine 工程技术-工程:电子与电气
CiteScore
27.20
自引率
0.70%
发文量
123
审稿时长
6-12 weeks
期刊介绍: EEE Signal Processing Magazine is a publication that focuses on signal processing research and applications. It publishes tutorial-style articles, columns, and forums that cover a wide range of topics related to signal processing. The magazine aims to provide the research, educational, and professional communities with the latest technical developments, issues, and events in the field. It serves as the main communication platform for the society, addressing important matters that concern all members.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信