基于梯度加权和约束剪枝的自适应偏差学习鲁棒视觉问答

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-08-28 DOI:10.1016/j.cviu.2025.104484

Zukun Wan , Runmin Wang , Xingdong Song , Juan Xu , Xiaofei Cao , Jielei Hei , Shengrong Yuan , Yajun Ding , Changxin Gao

{"title":"基于梯度加权和约束剪枝的自适应偏差学习鲁棒视觉问答","authors":"Zukun Wan , Runmin Wang , Xingdong Song , Juan Xu , Xiaofei Cao , Jielei Hei , Shengrong Yuan , Yajun Ding , Changxin Gao","doi":"10.1016/j.cviu.2025.104484","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Question Answering (VQA) presents significant challenges in cross-modal reasoning due to susceptibility to dataset biases, spurious correlations, and shortcuts learning, which undermine model robustness. While ensemble methods mitigate bias via joint optimization of a bias model and a target model during training, their efficacy remains limited by suboptimal bias exploitation and model capacity imbalances. To address this, we propose the Adaptive Bias Learning Network (ABLNet), a novel framework that systematically enhances bias capture for improved generalization. Our approach introduces two key innovations: (1) Gradient-driven sample reweighting, which quantifies per-sample bias magnitude via training gradients and prioritizes low-bias samples to refine bias model training; (2) Constrained network pruning, deliberately restricting bias model capacity to amplify its focus on bias patterns. Extensive evaluations on VQA-CPv1, VQA-CPv2, and VQA-v2 benchmarks confirm our ABLNet’s superiority, demonstrating generalizability across diverse question types. The code will be released at <span><span>https://github.com/runminwang/ABLNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104484"},"PeriodicalIF":3.5000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive bias learning via gradient-based reweighting and constrained pruning for robust Visual Question Answering\",\"authors\":\"Zukun Wan , Runmin Wang , Xingdong Song , Juan Xu , Xiaofei Cao , Jielei Hei , Shengrong Yuan , Yajun Ding , Changxin Gao\",\"doi\":\"10.1016/j.cviu.2025.104484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Visual Question Answering (VQA) presents significant challenges in cross-modal reasoning due to susceptibility to dataset biases, spurious correlations, and shortcuts learning, which undermine model robustness. While ensemble methods mitigate bias via joint optimization of a bias model and a target model during training, their efficacy remains limited by suboptimal bias exploitation and model capacity imbalances. To address this, we propose the Adaptive Bias Learning Network (ABLNet), a novel framework that systematically enhances bias capture for improved generalization. Our approach introduces two key innovations: (1) Gradient-driven sample reweighting, which quantifies per-sample bias magnitude via training gradients and prioritizes low-bias samples to refine bias model training; (2) Constrained network pruning, deliberately restricting bias model capacity to amplify its focus on bias patterns. Extensive evaluations on VQA-CPv1, VQA-CPv2, and VQA-v2 benchmarks confirm our ABLNet’s superiority, demonstrating generalizability across diverse question types. The code will be released at <span><span>https://github.com/runminwang/ABLNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"260 \",\"pages\":\"Article 104484\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225002073\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225002073","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视觉问答（VQA）在跨模态推理中提出了重大挑战，因为它容易受到数据集偏差、虚假相关性和捷径学习的影响，从而破坏了模型的鲁棒性。虽然集成方法通过在训练过程中对偏差模型和目标模型进行联合优化来减轻偏差，但其效果仍然受到次优偏差开发和模型容量失衡的限制。为了解决这个问题，我们提出了自适应偏差学习网络（ABLNet），这是一个系统地增强偏差捕获以提高泛化的新框架。我们的方法引入了两个关键的创新：(1)梯度驱动的样本重加权，它通过训练梯度量化每个样本的偏差大小，并优先考虑低偏差的样本来改进偏差模型训练；(2)约束网络修剪，有意限制偏倚模型的能力，放大其对偏倚模式的关注。对VQA-CPv1、VQA-CPv2和VQA-v2基准的广泛评估证实了我们的ABLNet的优势，展示了不同问题类型的通用性。代码将在https://github.com/runminwang/ABLNet上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive bias learning via gradient-based reweighting and constrained pruning for robust Visual Question Answering

Visual Question Answering (VQA) presents significant challenges in cross-modal reasoning due to susceptibility to dataset biases, spurious correlations, and shortcuts learning, which undermine model robustness. While ensemble methods mitigate bias via joint optimization of a bias model and a target model during training, their efficacy remains limited by suboptimal bias exploitation and model capacity imbalances. To address this, we propose the Adaptive Bias Learning Network (ABLNet), a novel framework that systematically enhances bias capture for improved generalization. Our approach introduces two key innovations: (1) Gradient-driven sample reweighting, which quantifies per-sample bias magnitude via training gradients and prioritizes low-bias samples to refine bias model training; (2) Constrained network pruning, deliberately restricting bias model capacity to amplify its focus on bias patterns. Extensive evaluations on VQA-CPv1, VQA-CPv2, and VQA-v2 benchmarks confirm our ABLNet’s superiority, demonstrating generalizability across diverse question types. The code will be released at https://github.com/runminwang/ABLNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems