Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang
{"title":"TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors","authors":"Yichuan Mo, Hui Huang, Mingjie Li, Ang Li, Yisen Wang","doi":"arxiv-2409.05294","DOIUrl":null,"url":null,"abstract":"Diffusion models have achieved notable success in image generation, but they\nremain highly vulnerable to backdoor attacks, which compromise their integrity\nby producing specific undesirable outputs when presented with a pre-defined\ntrigger. In this paper, we investigate how to protect diffusion models from\nthis dangerous threat. Specifically, we propose TERD, a backdoor defense\nframework that builds unified modeling for current attacks, which enables us to\nderive an accessible reversed loss. A trigger reversion strategy is further\nemployed: an initial approximation of the trigger through noise sampled from a\nprior distribution, followed by refinement through differential multi-step\nsamplers. Additionally, with the reversed trigger, we propose backdoor\ndetection from the noise space, introducing the first backdoor input detection\napproach for diffusion models and a novel model detection algorithm that\ncalculates the KL divergence between reversed and benign distributions.\nExtensive evaluations demonstrate that TERD secures a 100% True Positive Rate\n(TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD\nalso demonstrates nice adaptability to other Stochastic Differential Equation\n(SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diffusion models have achieved notable success in image generation, but they
remain highly vulnerable to backdoor attacks, which compromise their integrity
by producing specific undesirable outputs when presented with a pre-defined
trigger. In this paper, we investigate how to protect diffusion models from
this dangerous threat. Specifically, we propose TERD, a backdoor defense
framework that builds unified modeling for current attacks, which enables us to
derive an accessible reversed loss. A trigger reversion strategy is further
employed: an initial approximation of the trigger through noise sampled from a
prior distribution, followed by refinement through differential multi-step
samplers. Additionally, with the reversed trigger, we propose backdoor
detection from the noise space, introducing the first backdoor input detection
approach for diffusion models and a novel model detection algorithm that
calculates the KL divergence between reversed and benign distributions.
Extensive evaluations demonstrate that TERD secures a 100% True Positive Rate
(TPR) and True Negative Rate (TNR) across datasets of varying resolutions. TERD
also demonstrates nice adaptability to other Stochastic Differential Equation
(SDE)-based models. Our code is available at https://github.com/PKU-ML/TERD.