DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security Pub Date : 2023-06-13 DOI:10.1145/3579856.3582822

Zhicong Yan, Shenghong Li, Ruijie Zhao, Yuan Tian, Yuanyuan Zhao

{"title":"DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation","authors":"Zhicong Yan, Shenghong Li, Ruijie Zhao, Yuan Tian, Yuanyuan Zhao","doi":"10.1145/3579856.3582822","DOIUrl":null,"url":null,"abstract":"Backdoor attacks have emerged as an urgent threat to Deep Neural Networks (DNNs), where victim DNNs are furtively implanted with malicious neurons that could be triggered by the adversary. To defend against backdoor attacks, many works establish a staged pipeline to remove backdoors from victim DNNs: inspecting, locating, and erasing. However, in a scenario where a few clean data can be accessible, such pipeline is fragile and cannot erase backdoors completely without sacrificing model accuracy. To address this issue, in this paper, we propose a novel data-free holistic backdoor erasing (DHBE) framework. Instead of the staged pipeline, the DHBE treats the backdoor erasing task as a unified adversarial procedure, which seeks equilibrium between two different competing processes: distillation and backdoor regularization. In distillation, the backdoored DNN is distilled into a proxy model, transferring its knowledge about clean data, yet backdoors are simultaneously transferred. In backdoor regularization, the proxy model is holistically regularized to prevent from infecting any possible backdoor transferred from distillation. These two processes jointly proceed with data-free adversarial optimization until a clean, high-accuracy proxy model is obtained. With the novel adversarial design, our framework demonstrates its superiority in three aspects: 1) minimal detriment to model accuracy, 2) high tolerance for hyperparameters, and 3) no demand for clean data. Extensive experiments on various backdoor attacks and datasets are performed to verify the effectiveness of the proposed framework. Code is available at https://github.com/yanzhicong/DHBE","PeriodicalId":156082,"journal":{"name":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579856.3582822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Backdoor attacks have emerged as an urgent threat to Deep Neural Networks (DNNs), where victim DNNs are furtively implanted with malicious neurons that could be triggered by the adversary. To defend against backdoor attacks, many works establish a staged pipeline to remove backdoors from victim DNNs: inspecting, locating, and erasing. However, in a scenario where a few clean data can be accessible, such pipeline is fragile and cannot erase backdoors completely without sacrificing model accuracy. To address this issue, in this paper, we propose a novel data-free holistic backdoor erasing (DHBE) framework. Instead of the staged pipeline, the DHBE treats the backdoor erasing task as a unified adversarial procedure, which seeks equilibrium between two different competing processes: distillation and backdoor regularization. In distillation, the backdoored DNN is distilled into a proxy model, transferring its knowledge about clean data, yet backdoors are simultaneously transferred. In backdoor regularization, the proxy model is holistically regularized to prevent from infecting any possible backdoor transferred from distillation. These two processes jointly proceed with data-free adversarial optimization until a clean, high-accuracy proxy model is obtained. With the novel adversarial design, our framework demonstrates its superiority in three aspects: 1) minimal detriment to model accuracy, 2) high tolerance for hyperparameters, and 3) no demand for clean data. Extensive experiments on various backdoor attacks and datasets are performed to verify the effectiveness of the proposed framework. Code is available at https://github.com/yanzhicong/DHBE

查看原文本刊更多论文

基于限制对抗蒸馏的深度神经网络无数据整体后门擦除

后门攻击已经成为深度神经网络(dnn)面临的紧迫威胁，在这些攻击中，受害者dnn被秘密植入恶意神经元，这些神经元可能被对手触发。为了防御后门攻击，许多工作建立了一个阶段性的管道来清除受害dnn的后门:检查、定位、擦除。然而，在可以访问少量干净数据的场景中，这种管道是脆弱的，不能在不牺牲模型准确性的情况下完全清除后门。为了解决这个问题，在本文中，我们提出了一种新的无数据整体后门擦除(DHBE)框架。DHBE将后门擦除任务视为一个统一的对抗过程，而不是分阶段的管道，它寻求两个不同竞争过程之间的平衡:蒸馏和后门正则化。在蒸馏中，后门深度神经网络被提炼成一个代理模型，转移其关于干净数据的知识，同时转移后门。在后门正则化中，代理模型被整体正则化，以防止从蒸馏转移的任何可能的后门被感染。这两个过程共同进行无数据对抗优化，直到获得一个干净、高精度的代理模型。通过新的对抗性设计，我们的框架在三个方面展示了其优势:1)对模型精度的损害最小，2)对超参数的高容忍度，以及3)不需要干净的数据。在各种后门攻击和数据集上进行了大量实验，以验证所提出框架的有效性。代码可从https://github.com/yanzhicong/DHBE获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security

自引率

0.00%

发文量