M3C: Resist Agnostic Attacks by Mitigating Consistent Class Confusion Prior.

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-25 DOI:10.1109/tpami.2025.3614495

Xiaowei Fu,Fuxiang Huang,Guoyin Wang,Xinbo Gao,Lei Zhang

{"title":"M3C: Resist Agnostic Attacks by Mitigating Consistent Class Confusion Prior.","authors":"Xiaowei Fu,Fuxiang Huang,Guoyin Wang,Xinbo Gao,Lei Zhang","doi":"10.1109/tpami.2025.3614495","DOIUrl":null,"url":null,"abstract":"Adversarial attack is a major obstacle to the deployment of deep neural networks (DNNs) for security-sensitive applications. To address these adversarial perturbations, various adversarial defense strategies have been developed, with Adversarial Training (AT) being one of the most effective methods to protect neural networks from adversarial attacks. However, existing AT methods struggle against training-agnostic attacks due to their limited generalizability. This suggests that the AT models lack a unified perspective for various attacks to conduct universal defense. This paper sheds light on a generalizable prior under various attacks: consistent class confusion (3C), i.e., an AT classifier often confuses the predictions between correct and ambiguous classes in a highly similar pattern among diverse attacks. Relying on this latent prior as a bridge between seen and agnostic attacks, we propose a more generalized AT model by mitigating consistent class confusion (M3C) to resist training-agnostic attacks. Specifically, we optimize an Adversarial Confusion Loss (ACL), which is weighted by uncertainty, to distinguish the most confused classes and encourage the AT model to focus on these confused samples. To suppress malignant features affecting correct predictions and producing significant class confusion, we propose a Gradient-Aware Attention (GAA) mechanism to enhance the classification confidence of correct classes and eliminate class confusion. Experiments on multiple benchmarks and network frameworks demonstrate that our M3C model significantly improves the generalization of AT robustness against agnostic attacks. The finding of the 3C prior reveals the potential and possibility for defending against a wide range of attacks, and provides a new perspective to overcome such challenge in this field.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"23 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3614495","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Adversarial attack is a major obstacle to the deployment of deep neural networks (DNNs) for security-sensitive applications. To address these adversarial perturbations, various adversarial defense strategies have been developed, with Adversarial Training (AT) being one of the most effective methods to protect neural networks from adversarial attacks. However, existing AT methods struggle against training-agnostic attacks due to their limited generalizability. This suggests that the AT models lack a unified perspective for various attacks to conduct universal defense. This paper sheds light on a generalizable prior under various attacks: consistent class confusion (3C), i.e., an AT classifier often confuses the predictions between correct and ambiguous classes in a highly similar pattern among diverse attacks. Relying on this latent prior as a bridge between seen and agnostic attacks, we propose a more generalized AT model by mitigating consistent class confusion (M3C) to resist training-agnostic attacks. Specifically, we optimize an Adversarial Confusion Loss (ACL), which is weighted by uncertainty, to distinguish the most confused classes and encourage the AT model to focus on these confused samples. To suppress malignant features affecting correct predictions and producing significant class confusion, we propose a Gradient-Aware Attention (GAA) mechanism to enhance the classification confidence of correct classes and eliminate class confusion. Experiments on multiple benchmarks and network frameworks demonstrate that our M3C model significantly improves the generalization of AT robustness against agnostic attacks. The finding of the 3C prior reveals the potential and possibility for defending against a wide range of attacks, and provides a new perspective to overcome such challenge in this field.

查看原文本刊更多论文

M3C：通过减轻先前的一致类混淆来抵抗不可知论攻击。

对抗性攻击是深度神经网络（dnn）在安全敏感型应用中部署的主要障碍。为了解决这些对抗性干扰，已经开发了各种对抗性防御策略，对抗性训练（AT）是保护神经网络免受对抗性攻击的最有效方法之一。然而，现有的AT方法由于其有限的泛化性而与训练不可知论攻击作斗争。这说明AT模型对各种攻击缺乏统一的视角，无法进行通用防御。本文阐明了各种攻击下的可推广先验：一致类混淆（3C），即AT分类器经常在各种攻击中以高度相似的模式混淆正确类和模糊类之间的预测。依靠这种潜在先验作为可见攻击和不可知论攻击之间的桥梁，我们提出了一个更广义的AT模型，通过减轻一致类混淆（M3C）来抵抗训练不可知论攻击。具体来说，我们优化了一个对抗混淆损失（ACL），它是由不确定性加权的，以区分最混乱的类别，并鼓励AT模型关注这些混乱的样本。为了抑制影响正确预测和产生显著类混淆的恶性特征，我们提出了一种梯度感知注意（Gradient-Aware Attention， GAA）机制来增强正确类的分类置信度，消除类混淆。在多个基准测试和网络框架上的实验表明，我们的M3C模型显著提高了AT对不可知攻击的鲁棒性泛化。3C先验的发现揭示了防御大范围攻击的潜力和可能性，并为克服这一领域的挑战提供了新的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.