{"title":"PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning","authors":"Yukai Xu, Yujie Gu, Kouichi Sakurai","doi":"arxiv-2409.12072","DOIUrl":null,"url":null,"abstract":"Backdoor attacks pose a significant threat to deep neural networks,\nparticularly as recent advancements have led to increasingly subtle\nimplantation, making the defense more challenging. Existing defense mechanisms\ntypically rely on an additional clean dataset as a standard reference and\ninvolve retraining an auxiliary model or fine-tuning the entire victim model.\nHowever, these approaches are often computationally expensive and not always\nfeasible in practical applications. In this paper, we propose a novel and\nlightweight defense mechanism, termed PAD-FT, that does not require an\nadditional clean dataset and fine-tunes only a very small part of the model to\ndisinfect the victim model. To achieve this, our approach first introduces a\nsimple data purification process to identify and select the most-likely clean\ndata from the poisoned training dataset. The self-purified clean dataset is\nthen used for activation clipping and fine-tuning only the last classification\nlayer of the victim model. By integrating data purification, activation\nclipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates\nsuperior effectiveness across multiple backdoor attack methods and datasets, as\nconfirmed through extensive experimental evaluation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Backdoor attacks pose a significant threat to deep neural networks,
particularly as recent advancements have led to increasingly subtle
implantation, making the defense more challenging. Existing defense mechanisms
typically rely on an additional clean dataset as a standard reference and
involve retraining an auxiliary model or fine-tuning the entire victim model.
However, these approaches are often computationally expensive and not always
feasible in practical applications. In this paper, we propose a novel and
lightweight defense mechanism, termed PAD-FT, that does not require an
additional clean dataset and fine-tunes only a very small part of the model to
disinfect the victim model. To achieve this, our approach first introduces a
simple data purification process to identify and select the most-likely clean
data from the poisoned training dataset. The self-purified clean dataset is
then used for activation clipping and fine-tuning only the last classification
layer of the victim model. By integrating data purification, activation
clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates
superior effectiveness across multiple backdoor attack methods and datasets, as
confirmed through extensive experimental evaluation.