Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai
{"title":"LoRID: Low-Rank Iterative Diffusion for Adversarial Purification","authors":"Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai","doi":"arxiv-2409.08255","DOIUrl":null,"url":null,"abstract":"This work presents an information-theoretic examination of diffusion-based\npurification methods, the state-of-the-art adversarial defenses that utilize\ndiffusion models to remove malicious perturbations in adversarial examples. By\ntheoretically characterizing the inherent purification errors associated with\nthe Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank\nIterative Diffusion purification method designed to remove adversarial\nperturbation with low intrinsic purification errors. LoRID centers around a\nmulti-stage purification process that leverages multiple rounds of\ndiffusion-denoising loops at the early time-steps of the diffusion models, and\nthe integration of Tucker decomposition, an extension of matrix factorization,\nto remove adversarial noise at high-noise regimes. Consequently, LoRID\nincreases the effective diffusion time-steps and overcomes strong adversarial\nattacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ,\nand ImageNet datasets under both white-box and black-box settings.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work presents an information-theoretic examination of diffusion-based
purification methods, the state-of-the-art adversarial defenses that utilize
diffusion models to remove malicious perturbations in adversarial examples. By
theoretically characterizing the inherent purification errors associated with
the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank
Iterative Diffusion purification method designed to remove adversarial
perturbation with low intrinsic purification errors. LoRID centers around a
multi-stage purification process that leverages multiple rounds of
diffusion-denoising loops at the early time-steps of the diffusion models, and
the integration of Tucker decomposition, an extension of matrix factorization,
to remove adversarial noise at high-noise regimes. Consequently, LoRID
increases the effective diffusion time-steps and overcomes strong adversarial
attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ,
and ImageNet datasets under both white-box and black-box settings.