What Do Visual Models Look At? Dilated Attention for Targeted Transferable Attacks

IF 9.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2025-08-27 DOI:10.1007/s11263-025-02552-x

Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang

{"title":"What Do Visual Models Look At? Dilated Attention for Targeted Transferable Attacks","authors":"Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang","doi":"10.1007/s11263-025-02552-x","DOIUrl":null,"url":null,"abstract":"<p>Attention maps illustrate what visual models look at when processing benign images. However, when confronted with adversarial perturbations, attention undergoes significant alterations. Based on this phenomenon, previous non-targeted transferable attacks manipulate adversarial examples to generate distinct attention maps, disrupting crucial features shared among models. Nevertheless, the exploration of attention in targeted transferable attacks remains unexplored. To address this gap, we analyze alterations in attention across surrogate and black-box models, empirically observing that adversarial examples receiving more relevant features for the adversarial target label exhibit higher transferability across black-box models. Motivated by these findings, we propose the Dilated Attention (DA) attack, which integrates attention maximization loss and dynamic linear augmentation to improve targeted transferability. Attention maximization loss maximizes attention maps of the target label from multiple intermediate layers to attract greater attention. Dynamic linear augmentation leverages dynamic parameters to augment inputs with a broader range of attention maps, furnishing crafted perturbations with the robustness to dilate attention across diverse attention distributions. By considering the objective function and diverse inputs, DA generates adversarial examples with highly adversarial transferability against CNNs, ViTs, and adversarially trained models. We hope DA can serve as a foundational attack, guiding future research endeavors in the domain of targeted transferable attacks. The source code is available at: https://github.com/zhipeng-wei/DialtedAttention.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"32 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02552-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Attention maps illustrate what visual models look at when processing benign images. However, when confronted with adversarial perturbations, attention undergoes significant alterations. Based on this phenomenon, previous non-targeted transferable attacks manipulate adversarial examples to generate distinct attention maps, disrupting crucial features shared among models. Nevertheless, the exploration of attention in targeted transferable attacks remains unexplored. To address this gap, we analyze alterations in attention across surrogate and black-box models, empirically observing that adversarial examples receiving more relevant features for the adversarial target label exhibit higher transferability across black-box models. Motivated by these findings, we propose the Dilated Attention (DA) attack, which integrates attention maximization loss and dynamic linear augmentation to improve targeted transferability. Attention maximization loss maximizes attention maps of the target label from multiple intermediate layers to attract greater attention. Dynamic linear augmentation leverages dynamic parameters to augment inputs with a broader range of attention maps, furnishing crafted perturbations with the robustness to dilate attention across diverse attention distributions. By considering the objective function and diverse inputs, DA generates adversarial examples with highly adversarial transferability against CNNs, ViTs, and adversarially trained models. We hope DA can serve as a foundational attack, guiding future research endeavors in the domain of targeted transferable attacks. The source code is available at: https://github.com/zhipeng-wei/DialtedAttention.

查看原文本刊更多论文

视觉模型着眼于什么？扩大对目标转移攻击的关注

注意图说明了视觉模型在处理良性图像时所关注的内容。然而，当面对对抗性扰动时，注意力会发生重大变化。基于这一现象，以前的非目标可转移攻击操纵对抗性示例来生成不同的注意图，破坏模型之间共享的关键特征。然而，在有针对性的可转移攻击中对注意力的探索仍未得到探索。为了解决这一差距，我们分析了代理模型和黑箱模型之间注意力的变化，并通过经验观察到，接收到更多对抗目标标签相关特征的对抗示例在黑箱模型之间表现出更高的可转移性。基于这些发现，我们提出了扩张注意（DA）攻击，该攻击将注意力最大化损失和动态线性增强相结合，以提高目标转移能力。注意力最大化损失从多个中间层最大化目标标签的注意力图，以吸引更多的注意力。动态线性增强利用动态参数以更广泛的注意力图来增加输入，为精心制作的扰动提供鲁棒性，以在不同的注意力分布中扩展注意力。通过考虑目标函数和不同的输入，数据挖掘生成对抗cnn、vit和对抗训练模型的具有高度对抗可转移性的对抗示例。我们希望数据分析可以作为一种基础攻击，指导未来在目标转移攻击领域的研究工作。源代码可从https://github.com/zhipeng-wei/DialtedAttention获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.