Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Services Computing Pub Date : 2024-07-01 DOI:10.1109/TSC.2024.3417279

Rongfang Bie;Jinxiu Jiang;Hongcheng Xie;Yu Guo;Yinbin Miao;Xiaohua Jia

{"title":"Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation","authors":"Rongfang Bie;Jinxiu Jiang;Hongcheng Xie;Yu Guo;Yinbin Miao;Xiaohua Jia","doi":"10.1109/TSC.2024.3417279","DOIUrl":null,"url":null,"abstract":"Pre-trained encoders in computer vision have recently received great attention from both research and industry communities. Among others, a promising paradigm is to utilize self-supervised learning (SSL) to train image encoders with massive unlabeled samples, thereby endowing encoders with the capability to embed abundant knowledge into the feature representations. Backdoor attacks on SSL disrupt the encoder's feature extraction capabilities, causing downstream classifiers to inherit backdoor behavior and leading to misclassification. Existing backdoor defense methods primarily focus on supervised learning scenarios and cannot be effectively migrated to SSL pre-trained encoders. In this article, we present a backdoor defense scheme based on self-supervised knowledge distillation. Our approach aims to eliminate backdoors while preserving the feature extraction capability using the downstream dataset. We incorporate the benefits of contrastive and non-contrastive SSL methods for knowledge distillation, ensuring differentiation between the representations of various classes and the consistency of representations within the same class. Consequently, the extraction capability of pre-trained encoders is preserved. Extensive experiments against multiple attacks demonstrate that the proposed scheme outperforms the state-of-the-art solutions.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"17 5","pages":"2613-2625"},"PeriodicalIF":5.5000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10579882/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Pre-trained encoders in computer vision have recently received great attention from both research and industry communities. Among others, a promising paradigm is to utilize self-supervised learning (SSL) to train image encoders with massive unlabeled samples, thereby endowing encoders with the capability to embed abundant knowledge into the feature representations. Backdoor attacks on SSL disrupt the encoder's feature extraction capabilities, causing downstream classifiers to inherit backdoor behavior and leading to misclassification. Existing backdoor defense methods primarily focus on supervised learning scenarios and cannot be effectively migrated to SSL pre-trained encoders. In this article, we present a backdoor defense scheme based on self-supervised knowledge distillation. Our approach aims to eliminate backdoors while preserving the feature extraction capability using the downstream dataset. We incorporate the benefits of contrastive and non-contrastive SSL methods for knowledge distillation, ensuring differentiation between the representations of various classes and the consistency of representations within the same class. Consequently, the extraction capability of pre-trained encoders is preserved. Extensive experiments against multiple attacks demonstrate that the proposed scheme outperforms the state-of-the-art solutions.

查看原文本刊更多论文

通过自监督知识提炼缓解预训练编码器中的后门攻击

计算机视觉领域的预训练编码器最近受到了研究界和产业界的极大关注。其中，利用自我监督学习（SSL）来使用大量未标记样本训练图像编码器，从而赋予编码器将丰富知识嵌入特征表示的能力，是一种很有前途的范例。对 SSL 的后门攻击会破坏编码器的特征提取能力，使下游分类器继承后门行为，从而导致错误分类。现有的后门防御方法主要集中在监督学习场景，无法有效地迁移到 SSL 预训练编码器上。在本文中，我们提出了一种基于自监督知识提炼的后门防御方案。我们的方法旨在消除后门，同时利用下游数据集保留特征提取能力。我们结合了知识提炼中对比和非对比 SSL 方法的优点，确保了不同类别表征之间的差异化和同一类别内表征的一致性。因此，预训练编码器的提取能力得以保留。针对多种攻击的广泛实验表明，所提出的方案优于最先进的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Services Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

11.50

自引率

6.20%

发文量

278

审稿时长

>12 weeks

期刊介绍： IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.