自适应视觉语言提示学习者对噪声标签的学习

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-08-05 DOI:10.1016/j.jvcir.2025.104550

Changhui Hu , Bhalaji Nagarajan , Ricardo Marques , Petia Radeva

{"title":"自适应视觉语言提示学习者对噪声标签的学习","authors":"Changhui Hu , Bhalaji Nagarajan , Ricardo Marques , Petia Radeva","doi":"10.1016/j.jvcir.2025.104550","DOIUrl":null,"url":null,"abstract":"<div><div>Training deep learning models requires manual labelling of a large volume of diverse data that is a tedious and time-consuming process. As humans are prone to errors, large-scale data labelling often introduces label noise, leading to degradation in the performance of deep neural networks. Recently, pre-trained models on extensive multi-modal data have shown remarkable performance in computer vision tasks. However, their use to tackle the problem of learning with noisy labels is still in its infancy, due to high computational complexity and training costs. In this work, we propose a novel approach, AVL-Prompter, to effectively leverage vision-language-pre-trained models for learning with noisy labels. The key idea of our method is the use of shared deep learnable prompts between visual and textual encoders, allowing us to effectively adapt large V-L models to the downstream task of learning with noisy labels. Our technique exhibits superior performance, particularly in high-noise settings, outperforming state-of-the-art methods in several datasets with synthetic and real label noise. Our contribution comes from a novel, simple, but highly efficient methodological path to learning with noisy labels while remaining straightforward to implement. The code is available at <span><span>https://github.com/bhalajin/AVL-Prompter</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104550"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Vision-Language Prompt Learners for Learning with Noisy Labels\",\"authors\":\"Changhui Hu , Bhalaji Nagarajan , Ricardo Marques , Petia Radeva\",\"doi\":\"10.1016/j.jvcir.2025.104550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Training deep learning models requires manual labelling of a large volume of diverse data that is a tedious and time-consuming process. As humans are prone to errors, large-scale data labelling often introduces label noise, leading to degradation in the performance of deep neural networks. Recently, pre-trained models on extensive multi-modal data have shown remarkable performance in computer vision tasks. However, their use to tackle the problem of learning with noisy labels is still in its infancy, due to high computational complexity and training costs. In this work, we propose a novel approach, AVL-Prompter, to effectively leverage vision-language-pre-trained models for learning with noisy labels. The key idea of our method is the use of shared deep learnable prompts between visual and textual encoders, allowing us to effectively adapt large V-L models to the downstream task of learning with noisy labels. Our technique exhibits superior performance, particularly in high-noise settings, outperforming state-of-the-art methods in several datasets with synthetic and real label noise. Our contribution comes from a novel, simple, but highly efficient methodological path to learning with noisy labels while remaining straightforward to implement. The code is available at <span><span>https://github.com/bhalajin/AVL-Prompter</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"111 \",\"pages\":\"Article 104550\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320325001646\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001646","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

训练深度学习模型需要手动标记大量不同的数据，这是一个繁琐且耗时的过程。由于人类容易出错，大规模数据标注往往会引入标签噪声，导致深度神经网络性能下降。近年来，基于大量多模态数据的预训练模型在计算机视觉任务中表现出了显著的性能。然而，由于高计算复杂度和训练成本，它们用于解决带有噪声标签的学习问题仍处于起步阶段。在这项工作中，我们提出了一种新颖的方法，avl提示器，以有效地利用视觉语言预训练模型进行带有噪声标签的学习。我们方法的关键思想是在视觉和文本编码器之间使用共享的深度可学习提示，使我们能够有效地使大型V-L模型适应带有噪声标签的下游学习任务。我们的技术表现出卓越的性能，特别是在高噪声环境中，在几个具有合成和真实标签噪声的数据集中表现优于最先进的方法。我们的贡献来自于一种新颖，简单，但高效的方法路径来学习带有噪声标签，同时保持直接实现。代码可在https://github.com/bhalajin/AVL-Prompter上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Vision-Language Prompt Learners for Learning with Noisy Labels

Training deep learning models requires manual labelling of a large volume of diverse data that is a tedious and time-consuming process. As humans are prone to errors, large-scale data labelling often introduces label noise, leading to degradation in the performance of deep neural networks. Recently, pre-trained models on extensive multi-modal data have shown remarkable performance in computer vision tasks. However, their use to tackle the problem of learning with noisy labels is still in its infancy, due to high computational complexity and training costs. In this work, we propose a novel approach, AVL-Prompter, to effectively leverage vision-language-pre-trained models for learning with noisy labels. The key idea of our method is the use of shared deep learnable prompts between visual and textual encoders, allowing us to effectively adapt large V-L models to the downstream task of learning with noisy labels. Our technique exhibits superior performance, particularly in high-noise settings, outperforming state-of-the-art methods in several datasets with synthetic and real label noise. Our contribution comes from a novel, simple, but highly efficient methodological path to learning with noisy labels while remaining straightforward to implement. The code is available at https://github.com/bhalajin/AVL-Prompter.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.