Defending Against Universal Patch Attacks by Restricting Token Attention in Vision Transformers

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2023-06-04 DOI:10.1109/ICASSP49357.2023.10096862

Hongwei Yu, Jiansheng Chen, Huimin Ma, Cheng Yu, Xinlong Ding

引用次数: 1

Abstract

Previous works reveal that similar to CNNs, vision transformers (ViT) are also vulnerable to universal adversarial patch attacks. In this paper, we empirically reveal and mathematically explain that the shallow tokens in the transformer and the attention of the network can largely influence the classification result. Adversarial patches usually produce large feature norm for the corresponding shallow token vectors which can attract the attention anomalously. Inspired by this, we propose a restriction operation on the attention matrix, which effectively reduces the influence of the patch region. Experiments on ImageNet validate that our proposal can effectively improve ViT’s robustness towards white-box universal patch attacks while maintaining satisfactory classification accuracy for clean samples.

查看原文本刊更多论文

通过限制视觉变形器中的令牌注意力来防御通用补丁攻击

先前的研究表明，与cnn类似，视觉变形器(ViT)也容易受到普遍的对抗性补丁攻击。在本文中，我们通过经验揭示和数学解释了变压器中的浅层标记和网络的关注对分类结果有很大影响。对抗性补丁通常会对相应的浅标记向量产生较大的特征范数，从而引起异常的注意。受此启发，我们提出了一种对注意力矩阵的限制运算，有效地降低了patch区域的影响。在ImageNet上的实验验证了我们的方法可以有效地提高ViT对白盒通用补丁攻击的鲁棒性，同时对干净样本保持满意的分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量