通过决策依赖理解深度神经网络的对抗鲁棒性

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-09-27 DOI:10.1016/j.imavis.2025.105743

Soyoun Won , Hyeon Bae Kim , Yong Hyun Ahn , Hong Joo Lee , Seong Tae Kim

{"title":"通过决策依赖理解深度神经网络的对抗鲁棒性","authors":"Soyoun Won , Hyeon Bae Kim , Yong Hyun Ahn , Hong Joo Lee , Seong Tae Kim","doi":"10.1016/j.imavis.2025.105743","DOIUrl":null,"url":null,"abstract":"<div><div>Adversarial robustness has become a major concern as machine learning models are increasingly deployed in high-risk and high-impact applications. Accordingly, various adversarial training strategies are proposed, making the model more robust under adversarial attack. However, similar to deep neural networks (DNNs) themselves, the mechanisms through which adversarial training strategies improve model robustness remain opaque. In this paper, we reveal how adversarial training alters the internal workings of deep neural networks by conducting neuron-wise decision reliance analysis. We find that adversarially vulnerable models predominantly rely on a small subset of predictive neurons while adversarially robust models tend to distribute their reliance across a broader range of neurons. We validate the relationship between decision reliance and adversarial robustness through comprehensive experiments across various models, training objectives, and attack scenarios. We observe that this relationship also holds for standard trained models, including those trained with Mixup or CutMix, which demonstrate improved performance against one-step adversarial attacks. Furthermore, we show that minimizing decision reliance leads to improved adversarial robustness. Our findings enrich the understanding of adversarially trained models and offer an interpretable and efficient approach to analyzing their internal mechanisms.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105743"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Understanding adversarial robustness of deep neural networks via decision reliance\",\"authors\":\"Soyoun Won , Hyeon Bae Kim , Yong Hyun Ahn , Hong Joo Lee , Seong Tae Kim\",\"doi\":\"10.1016/j.imavis.2025.105743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Adversarial robustness has become a major concern as machine learning models are increasingly deployed in high-risk and high-impact applications. Accordingly, various adversarial training strategies are proposed, making the model more robust under adversarial attack. However, similar to deep neural networks (DNNs) themselves, the mechanisms through which adversarial training strategies improve model robustness remain opaque. In this paper, we reveal how adversarial training alters the internal workings of deep neural networks by conducting neuron-wise decision reliance analysis. We find that adversarially vulnerable models predominantly rely on a small subset of predictive neurons while adversarially robust models tend to distribute their reliance across a broader range of neurons. We validate the relationship between decision reliance and adversarial robustness through comprehensive experiments across various models, training objectives, and attack scenarios. We observe that this relationship also holds for standard trained models, including those trained with Mixup or CutMix, which demonstrate improved performance against one-step adversarial attacks. Furthermore, we show that minimizing decision reliance leads to improved adversarial robustness. Our findings enrich the understanding of adversarially trained models and offer an interpretable and efficient approach to analyzing their internal mechanisms.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"163 \",\"pages\":\"Article 105743\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625003312\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003312","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着机器学习模型越来越多地部署在高风险和高影响的应用中，对抗性鲁棒性已经成为一个主要问题。相应地，提出了多种对抗训练策略，使模型在对抗攻击下具有更强的鲁棒性。然而，与深度神经网络（dnn）本身类似，对抗性训练策略提高模型鲁棒性的机制仍然不清楚。在本文中，我们通过进行神经元智能决策依赖分析，揭示了对抗性训练如何改变深度神经网络的内部工作。我们发现，对抗脆弱模型主要依赖于一小部分预测神经元，而对抗稳健模型倾向于将其依赖分布在更广泛的神经元上。我们通过各种模型、训练目标和攻击场景的综合实验验证了决策依赖和对抗鲁棒性之间的关系。我们观察到，这种关系也适用于标准训练模型，包括那些使用Mixup或CutMix训练的模型，它们在对抗一步对抗性攻击时表现出更高的性能。此外，我们表明最小化决策依赖导致改进的对抗鲁棒性。我们的发现丰富了对对抗训练模型的理解，并提供了一种可解释和有效的方法来分析其内部机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Understanding adversarial robustness of deep neural networks via decision reliance

Adversarial robustness has become a major concern as machine learning models are increasingly deployed in high-risk and high-impact applications. Accordingly, various adversarial training strategies are proposed, making the model more robust under adversarial attack. However, similar to deep neural networks (DNNs) themselves, the mechanisms through which adversarial training strategies improve model robustness remain opaque. In this paper, we reveal how adversarial training alters the internal workings of deep neural networks by conducting neuron-wise decision reliance analysis. We find that adversarially vulnerable models predominantly rely on a small subset of predictive neurons while adversarially robust models tend to distribute their reliance across a broader range of neurons. We validate the relationship between decision reliance and adversarial robustness through comprehensive experiments across various models, training objectives, and attack scenarios. We observe that this relationship also holds for standard trained models, including those trained with Mixup or CutMix, which demonstrate improved performance against one-step adversarial attacks. Furthermore, we show that minimizing decision reliance leads to improved adversarial robustness. Our findings enrich the understanding of adversarially trained models and offer an interpretable and efficient approach to analyzing their internal mechanisms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.