LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

arXiv - CS - Machine Learning Pub Date : 2024-09-12 DOI:arxiv-2409.08255

Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai

引用次数: 0

Abstract

This work presents an information-theoretic examination of diffusion-based purification methods, the state-of-the-art adversarial defenses that utilize diffusion models to remove malicious perturbations in adversarial examples. By theoretically characterizing the inherent purification errors associated with the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors. LoRID centers around a multi-stage purification process that leverages multiple rounds of diffusion-denoising loops at the early time-steps of the diffusion models, and the integration of Tucker decomposition, an extension of matrix factorization, to remove adversarial noise at high-noise regimes. Consequently, LoRID increases the effective diffusion time-steps and overcomes strong adversarial attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.

查看原文本刊更多论文

LoRID：逆向纯化的低链迭代扩散

本研究从信息论角度对基于扩散的净化方法进行了研究，这些方法是最先进的对抗防御手段，利用扩散模型来消除对抗示例中的恶意扰动。通过从理论上描述与基于马尔可夫的扩散净化相关的固有净化误差，我们引入了 LoRID，这是一种新型的低阶迭代扩散净化方法，旨在以较低的固有净化误差消除对抗性扰动。LoRID 以多级净化过程为中心，在扩散模型的早期时间步骤利用多轮扩散-去噪循环，并结合矩阵因式分解的扩展--塔克分解，以去除高噪声状态下的对抗性噪声。因此，LoRID 增加了有效的扩散时间步数，克服了强大的对抗性攻击，在 CIFAR-10/100、CelebA-HQ 和 ImageNet 数据集的白盒和黑盒设置下都取得了卓越的鲁棒性表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量