校正批量归一化信号的分布以减轻木马危害

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-07 DOI:10.1016/j.neucom.2024.128752

Xi Li , Zhen Xiang , David J. Miller , George Kesidis

{"title":"校正批量归一化信号的分布以减轻木马危害","authors":"Xi Li , Zhen Xiang , David J. Miller , George Kesidis","doi":"10.1016/j.neucom.2024.128752","DOIUrl":null,"url":null,"abstract":"<div><div>Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker’s backdoor trigger causes a test instance to be misclassified into the attacker’s chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote.<span><span><sup>2</sup></span></span> The source codes can be found in the link<span><span><sup>3</sup></span></span> at the footnote.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128752"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correcting the distribution of batch normalization signals for Trojan mitigation\",\"authors\":\"Xi Li , Zhen Xiang , David J. Miller , George Kesidis\",\"doi\":\"10.1016/j.neucom.2024.128752\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker’s backdoor trigger causes a test instance to be misclassified into the attacker’s chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote.<span><span><sup>2</sup></span></span> The source codes can be found in the link<span><span><sup>3</sup></span></span> at the footnote.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"614 \",\"pages\":\"Article 128752\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224015236\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015236","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

后门（木马）攻击是对深度神经网络（DNN）的一个重大威胁。在此类攻击中，攻击者后门触发器的存在会导致测试实例被错误分类为攻击者选择的目标类别。训练后缓解方法旨在纠正这些错误分类，确保中毒模型能正确分类后门触发的样本。这些方法要求防御者能够访问小型、干净的数据集和可能被破坏的 DNN。然而，大多数防御方法都依赖于参数微调，因此其有效性取决于防御者可用的数据集大小。为了克服现有方法的局限性，我们提出了一种方法，通过纠正后门触发实例内部层激活分布的改变来纠正错误分类。通过对内部激活进行简单的转换，就能纠正分布的改变。值得注意的是，我们的方法没有修改 DNN 的任何可训练参数，但却在各种后门攻击和基准测试中取得了普遍良好的缓解性能。因此，我们的方法即使在清洁数据量有限的情况下也表现出了鲁棒性，使其在现实世界的应用中非常实用。我们的方法的有效性通过理论分析和大量实验得到了验证。附录以电子版形式提供，可通过脚注中的链接访问2 。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Correcting the distribution of batch normalization signals for Trojan mitigation

Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker’s backdoor trigger causes a test instance to be misclassified into the attacker’s chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote.² The source codes can be found in the link³ at the footnote.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.