VADS：视觉问题解答的 Visuo-Adaptive DualStrike 攻击

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-08-31 DOI:10.1016/j.cviu.2024.104137

Boyuan Zhang , Jiaxu Li , Yucheng Shi , Yahong Han , Qinghua Hu

{"title":"VADS：视觉问题解答的 Visuo-Adaptive DualStrike 攻击","authors":"Boyuan Zhang , Jiaxu Li , Yucheng Shi , Yahong Han , Qinghua Hu","doi":"10.1016/j.cviu.2024.104137","DOIUrl":null,"url":null,"abstract":"<div><p>Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. The adversarial vulnerability of VQA models is crucial for their reliability in real-world applications. However, current VQA attacks are mainly focused on the white-box and transfer-based settings, which require the attacker to have full or partial prior knowledge of victim VQA models. Besides that, query-based VQA attacks require a massive amount of query times, which the victim model may detect. In this paper, we propose the Visuo-Adaptive DualStrike (VADS) attack, a novel adversarial attack method combining transfer-based and query-based strategies to exploit vulnerabilities in VQA systems. Unlike current VQA attacks focusing on either approach, VADS leverages a momentum-like ensemble method to search potential attack targets and compress the perturbation. After that, our method employs a query-based strategy to dynamically adjust the weight of perturbation per surrogate model. We evaluate the effectiveness of VADS across 8 VQA models and two datasets. The results demonstrate that VADS outperforms existing adversarial techniques in both efficiency and success rate. Our code is available at: <span><span>https://github.com/stevenzhang9577/VADS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104137"},"PeriodicalIF":4.3000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VADS: Visuo-Adaptive DualStrike attack on visual question answer\",\"authors\":\"Boyuan Zhang , Jiaxu Li , Yucheng Shi , Yahong Han , Qinghua Hu\",\"doi\":\"10.1016/j.cviu.2024.104137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. The adversarial vulnerability of VQA models is crucial for their reliability in real-world applications. However, current VQA attacks are mainly focused on the white-box and transfer-based settings, which require the attacker to have full or partial prior knowledge of victim VQA models. Besides that, query-based VQA attacks require a massive amount of query times, which the victim model may detect. In this paper, we propose the Visuo-Adaptive DualStrike (VADS) attack, a novel adversarial attack method combining transfer-based and query-based strategies to exploit vulnerabilities in VQA systems. Unlike current VQA attacks focusing on either approach, VADS leverages a momentum-like ensemble method to search potential attack targets and compress the perturbation. After that, our method employs a query-based strategy to dynamically adjust the weight of perturbation per surrogate model. We evaluate the effectiveness of VADS across 8 VQA models and two datasets. The results demonstrate that VADS outperforms existing adversarial techniques in both efficiency and success rate. Our code is available at: <span><span>https://github.com/stevenzhang9577/VADS</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"249 \",\"pages\":\"Article 104137\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002182\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002182","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视觉问题解答（VQA）是计算机视觉和自然语言处理领域的一项基本任务。VQA 模型的对抗脆弱性对其在实际应用中的可靠性至关重要。然而，目前的 VQA 攻击主要集中在白盒和基于传输的设置上，这要求攻击者对受害者的 VQA 模型有完全或部分的先验知识。此外，基于查询的 VQA 攻击需要大量的查询次数，而受害者模型可能会检测到这些查询次数。在本文中，我们提出了 Visuo-Adaptive DualStrike（VADS）攻击，这是一种新型对抗攻击方法，结合了基于传输和基于查询的策略，以利用 VQA 系统中的漏洞。不同于目前的 VQA 攻击只关注其中一种方法，VADS 利用类似动量的集合方法来搜索潜在的攻击目标并压缩扰动。然后，我们的方法采用基于查询的策略，动态调整每个代理模型的扰动权重。我们在 8 个 VQA 模型和两个数据集上评估了 VADS 的有效性。结果表明，VADS 在效率和成功率上都优于现有的对抗技术。我们的代码可在以下网址获取：https://github.com/stevenzhang9577/VADS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VADS: Visuo-Adaptive DualStrike attack on visual question answer

Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. The adversarial vulnerability of VQA models is crucial for their reliability in real-world applications. However, current VQA attacks are mainly focused on the white-box and transfer-based settings, which require the attacker to have full or partial prior knowledge of victim VQA models. Besides that, query-based VQA attacks require a massive amount of query times, which the victim model may detect. In this paper, we propose the Visuo-Adaptive DualStrike (VADS) attack, a novel adversarial attack method combining transfer-based and query-based strategies to exploit vulnerabilities in VQA systems. Unlike current VQA attacks focusing on either approach, VADS leverages a momentum-like ensemble method to search potential attack targets and compress the perturbation. After that, our method employs a query-based strategy to dynamically adjust the weight of perturbation per surrogate model. We evaluate the effectiveness of VADS across 8 VQA models and two datasets. The results demonstrate that VADS outperforms existing adversarial techniques in both efficiency and success rate. Our code is available at: https://github.com/stevenzhang9577/VADS.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems