Diffense: Defense Against Backdoor Attacks on Deep Neural Networks With Latent Diffusion

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-09-27 DOI:10.1109/JETCAS.2024.3469377

Bowen Hu;Chip-Hong Chang

{"title":"Diffense: Defense Against Backdoor Attacks on Deep Neural Networks With Latent Diffusion","authors":"Bowen Hu;Chip-Hong Chang","doi":"10.1109/JETCAS.2024.3469377","DOIUrl":null,"url":null,"abstract":"As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"729-742"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10697229/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

As deep neural network (DNN) models are used in a wide variety of applications, their security has attracted considerable attention. Among the known security vulnerabilities, backdoor attacks have become the most notorious threat to users of pre-trained DNNs and machine learning services. Such attacks manipulate the training data or training process in such a way that the trained model produces a false output to an input that carries a specific trigger, but behaves normally otherwise. In this work, we propose Diffense, a method for detecting such malicious inputs based on the distribution of the latent feature maps to clean input samples of the possibly infected target DNN. By learning the feature map distribution using the diffusion model and sampling from the model under the guidance of the data to be inspected, backdoor attack data can be detected by its distance from the sampled result. Diffense does not require knowledge about the structure, weights, and training data of the target DNN model, nor does it need to be aware of the backdoor attack method. Diffense is non-intrusive. The accuracy of the target model to clean inputs will not be affected by Diffense and the inference service can be run uninterruptedly with Diffense. Extensive experiments were conducted on DNNs trained for MNIST, CIFRA-10, GSTRB, ImageNet-10, LSUN Object and LSUN Scene applications to show that the attack success rates of diverse backdoor attacks, including BadNets, IDBA, WaNet, ISSBA and HTBA, can be significantly suppressed by Diffense. The results generally exceed the performances of existing backdoor mitigation methods, including those that require model modifications or prerequisite knowledge of model weights or attack samples.

查看原文本刊更多论文

差分：基于潜在扩散的深度神经网络后门攻击防御

随着深度神经网络（DNN）模型的广泛应用，其安全性引起了人们的广泛关注。在已知的安全漏洞中，后门攻击已成为预训练dnn和机器学习服务用户最臭名昭著的威胁。这种攻击以这样一种方式操纵训练数据或训练过程，即训练模型对带有特定触发器的输入产生错误输出，但行为正常。在这项工作中，我们提出了Diffense，一种基于潜在特征映射的分布来检测这种恶意输入的方法，以清除可能被感染的目标DNN的输入样本。利用扩散模型学习特征映射分布，并在待检测数据的指导下对模型进行采样，通过与采样结果的距离来检测后门攻击数据。diffence不需要了解目标DNN模型的结构、权重和训练数据，也不需要知道后门攻击方法。diffence是非侵入性的。目标模型清理输入的准确性不会受到Diffense的影响，并且推理服务可以使用Diffense不间断地运行。在MNIST、CIFRA-10、GSTRB、ImageNet-10、LSUN Object和LSUN Scene应用中训练的dnn上进行了大量实验，结果表明Diffense可以显著抑制包括BadNets、IDBA、WaNet、ISSBA和HTBA在内的各种后门攻击的攻击成功率。结果通常超过现有后门缓解方法的性能，包括那些需要修改模型或预先了解模型权重或攻击样本的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.50

自引率

2.20%

发文量

期刊介绍： The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.