TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

arXiv - CS - Cryptography and Security Pub Date : 2024-09-09 DOI:arxiv-2409.05294

Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang

{"title":"TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors","authors":"Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang","doi":"arxiv-2409.05294","DOIUrl":null,"url":null,"abstract":"Diffusion models have achieved notable success in image generation, but they\nremain highly vulnerable to backdoor attacks, which compromise their integrity\nby producing specific undesirable outputs when presented with a pre-defined\ntrigger. In this paper, we investigate how to protect diffusion models from\nthis dangerous threat. Specifically, we propose TERD, a backdoor defense\nframework that builds unified modeling for current attacks, which enables us to\nderive an accessible reversed loss. A trigger reversion strategy is further\nemployed: an initial approximation of the trigger through noise sampled from a\nprior distribution, followed by refinement through differential multi-step\nsamplers. Additionally, with the reversed trigger, we propose backdoor\ndetection from the noise space, introducing the first backdoor input detection\napproach for diffusion models and a novel model detection algorithm that\ncalculates the KL divergence between reversed and benign distributions.\nExtensive evaluations demonstrate that TERD secures a 100% True Positive Rate\n(TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD\nalso demonstrates nice adaptability to other Stochastic Differential Equation\n(SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. Specifically, we propose TERD, a backdoor defense framework that builds unified modeling for current attacks, which enables us to derive an accessible reversed loss. A trigger reversion strategy is further employed: an initial approximation of the trigger through noise sampled from a prior distribution, followed by refinement through differential multi-step samplers. Additionally, with the reversed trigger, we propose backdoor detection from the noise space, introducing the first backdoor input detection approach for diffusion models and a novel model detection algorithm that calculates the KL divergence between reversed and benign distributions. Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate (TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD also demonstrates nice adaptability to other Stochastic Differential Equation (SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.

查看原文本刊更多论文

TERD：防范扩散模型后门的统一框架

扩散模型在图像生成方面取得了显著的成就，但它们仍然极易受到后门攻击的影响，这种攻击会在出现预定义触发时产生特定的不良输出，从而破坏其完整性。在本文中，我们研究了如何保护扩散模型免受这种危险威胁。具体来说，我们提出了 TERD--一种后门防御框架，它为当前的攻击建立了统一的模型，使我们能够预测可访问的反向损失。此外，我们还采用了一种触发器还原策略：通过从先前分布中采样的噪声对触发器进行初始近似，然后通过差分多步采样器进行细化。此外，利用反向触发器，我们提出了从噪声空间进行后门输入检测的方法，为扩散模型引入了第一种后门输入检测方法，以及一种计算反向分布和良性分布之间 KL 发散的新型模型检测算法。TERD 还能很好地适应其他基于随机微分方程（SDE）的模型。我们的代码见 https://github.com/PKU-ML/TERD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Cryptography and Security

自引率

0.00%

发文量