Effective defense against physically embedded backdoor attacks via clustering-based filtering

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2025-04-15 DOI:10.1007/s40747-025-01876-y

Mohammed Kutbi

{"title":"Effective defense against physically embedded backdoor attacks via clustering-based filtering","authors":"Mohammed Kutbi","doi":"10.1007/s40747-025-01876-y","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>Backdoor attacks pose a severe threat to the integrity of machine learning models, especially in real-world image classification tasks. In such attacks, adversaries embed malicious behaviors triggered by specific patterns in the training data, causing models to misclassify whenever the trigger is present. This paper introduces a novel, <i>model-agnostic</i> defense that systematically detects and removes backdoor-infected samples using a synergy of dimensionality reduction and unsupervised clustering. Unlike most existing methods that address <i>digitally</i> added triggers, our approach specifically targets <i>physically</i> embedded triggers (e.g., a bandage placed on a face), which closely resemble real-world occlusions and are therefore harder to detect. We first extract high-level features from a trusted, pre-trained model, reduce the feature dimensionality via Principal Component Analysis (PCA), and then fit Gaussian Mixture Models (GMMs) to cluster suspicious samples. By identifying and filtering out outlying clusters, we effectively isolate poisoned images without assuming knowledge of the trigger or requiring access to the victim model. Extensive experiments on face versus non-face classification demonstrate that our defense substantially reduces attack success rates while preserving high accuracy on clean data, offering a practical and robust solution against challenging backdoor scenarios.</p><h3 data-test=\"abstract-sub-heading\">Graphic Abstract</h3>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"108 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01876-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Backdoor attacks pose a severe threat to the integrity of machine learning models, especially in real-world image classification tasks. In such attacks, adversaries embed malicious behaviors triggered by specific patterns in the training data, causing models to misclassify whenever the trigger is present. This paper introduces a novel, model-agnostic defense that systematically detects and removes backdoor-infected samples using a synergy of dimensionality reduction and unsupervised clustering. Unlike most existing methods that address digitally added triggers, our approach specifically targets physically embedded triggers (e.g., a bandage placed on a face), which closely resemble real-world occlusions and are therefore harder to detect. We first extract high-level features from a trusted, pre-trained model, reduce the feature dimensionality via Principal Component Analysis (PCA), and then fit Gaussian Mixture Models (GMMs) to cluster suspicious samples. By identifying and filtering out outlying clusters, we effectively isolate poisoned images without assuming knowledge of the trigger or requiring access to the victim model. Extensive experiments on face versus non-face classification demonstrate that our defense substantially reduces attack success rates while preserving high accuracy on clean data, offering a practical and robust solution against challenging backdoor scenarios.

Graphic Abstract

查看原文本刊更多论文

通过基于聚类的过滤有效防御物理嵌入的后门攻击

摘要后门攻击对机器学习模型的完整性构成了严重威胁，特别是在现实世界的图像分类任务中。在这种攻击中，攻击者在训练数据中嵌入由特定模式触发的恶意行为，导致模型在触发器存在时错误分类。本文介绍了一种新颖的、模型不可知的防御方法，该方法利用降维和无监督聚类的协同作用，系统地检测和移除后门感染样本。与大多数解决数字添加触发器的现有方法不同，我们的方法专门针对物理嵌入触发器（例如，放置在脸上的绷带），这与现实世界的闭塞非常相似，因此更难检测。我们首先从可信的预训练模型中提取高级特征，通过主成分分析（PCA）降低特征维数，然后拟合高斯混合模型（GMMs）对可疑样本进行聚类。通过识别和过滤掉外围集群，我们有效地隔离了中毒图像，而无需假设知道触发因素或需要访问受害者模型。人脸与非人脸分类的广泛实验表明，我们的防御大大降低了攻击成功率，同时在干净数据上保持了高精度，为具有挑战性的后门场景提供了实用而强大的解决方案。图形抽象

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.