{"title":"Mutual Information Guided Backdoor Mitigation for Pre-Trained Encoders","authors":"Tingxu Han;Weisong Sun;Ziqi Ding;Chunrong Fang;Hanwei Qian;Jiaxun Li;Zhenyu Chen;Xiangyu Zhang","doi":"10.1109/TIFS.2025.3550062","DOIUrl":null,"url":null,"abstract":"Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC(<underline>M</u>utual <underline>I</u>nformation guided backdoor <underline>MI</u>tigation for pre-trained en<underline>C</u>oders). MIMIC uses the potentially backdoored encoder as the teacher network and applies knowledge distillation to create a clean student encoder from it. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing <inline-formula> <tex-math>$\\leq 5$ </tex-math></inline-formula>% of clean pre-training data that is accessible to the defender, surpassing seven state-of-the-art backdoor mitigation techniques. The source code of MIMIC is available at <uri>https://github.com/wssun/MIMIC</uri>.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3414-3428"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930652/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC(Mutual Information guided backdoor MItigation for pre-trained enCoders). MIMIC uses the potentially backdoored encoder as the teacher network and applies knowledge distillation to create a clean student encoder from it. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing $\leq 5$ % of clean pre-training data that is accessible to the defender, surpassing seven state-of-the-art backdoor mitigation techniques. The source code of MIMIC is available at https://github.com/wssun/MIMIC.
自监督学习(SSL)对于不需要标记数据的预训练编码器越来越有吸引力。建立在这些预训练编码器之上的下游任务可以实现近乎最先进的性能。然而,现有的研究表明,SSL预训练的编码器容易受到后门攻击。为下游任务模型设计了许多后门缓解技术。然而,当适应预训练编码器时,由于预训练时缺乏标签信息,它们的有效性受到损害和限制。为了解决针对预训练编码器的后门攻击,本文创新性地提出了一种互信息引导的后门缓解技术,命名为MIMIC(mutual information guided backdoor mitigation for pre-trained encoders)。MIMIC使用潜在的后门编码器作为教师网络,并应用知识蒸馏从它创建一个干净的学生编码器。与现有的知识蒸馏方法不同,MIMIC以随机权重初始化学生,不继承教师网的后门。然后,MIMIC利用每层之间的互信息和提取的特征来定位良性知识在教师网络中的位置,并利用这些信息进行蒸馏,从教师到学生克隆干净的特征。我们从克隆损失和注意力损失两个方面来制作蒸馏损失,目的是在减少后门的同时保持编码器的性能。我们对SSL中的两个后门攻击进行的评估表明,MIMIC可以通过仅利用 $\leq 5$ % of clean pre-training data that is accessible to the defender, surpassing seven state-of-the-art backdoor mitigation techniques. The source code of MIMIC is available at https://github.com/wssun/MIMIC.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features