{"title":"通过激活子集扫描的输入检测防御后门攻击","authors":"Yu Xuan, Xiaojun Chen, Zhendong Zhao, Yangyang Ding, Jianming Lv","doi":"10.1109/IJCNN55064.2022.9891900","DOIUrl":null,"url":null,"abstract":"Deep neural networks are vulnerable to backdoor attacks where adversaries inject the trigger into partial training data to manipulate the trained model misclassification. In addition, the poisoned model behaves normally on clean inputs, and the malicious behavior only occurs when the secret trigger is present, making backdoor attacks hard to be detected. Most existing input detection methods leverage the link between triggers and outputs to reveal the poisoned inputs, which suffer from the trigger-size or the “all-to-all” attack scenario. We show that the internal activations produced by benign and poisoned inputs are significantly different in the poisoned model. In this paper, we propose a novel and run-time input detection algorithm, Activation Subset Scanning (ACTSS), which extracts the activations of incoming inputs and leverages an anomaly detection algorithm to identify malicious inputs. We search and score for the abnormal activation subset according to the statistical difference of activations between benign and poisoned data using nonparametric statistics technology. Extensive experiments are conducted on three public datasets: CIFAR10, GTSRB, and ImageNet, with three representative models. The results verify our approach's effectiveness and state-of-the-art performance, which achieve over 98% false rejection rate for different types of triggers.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"435 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ACTSS: Input Detection Defense against Backdoor Attacks via Activation Subset Scanning\",\"authors\":\"Yu Xuan, Xiaojun Chen, Zhendong Zhao, Yangyang Ding, Jianming Lv\",\"doi\":\"10.1109/IJCNN55064.2022.9891900\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks are vulnerable to backdoor attacks where adversaries inject the trigger into partial training data to manipulate the trained model misclassification. In addition, the poisoned model behaves normally on clean inputs, and the malicious behavior only occurs when the secret trigger is present, making backdoor attacks hard to be detected. Most existing input detection methods leverage the link between triggers and outputs to reveal the poisoned inputs, which suffer from the trigger-size or the “all-to-all” attack scenario. We show that the internal activations produced by benign and poisoned inputs are significantly different in the poisoned model. In this paper, we propose a novel and run-time input detection algorithm, Activation Subset Scanning (ACTSS), which extracts the activations of incoming inputs and leverages an anomaly detection algorithm to identify malicious inputs. We search and score for the abnormal activation subset according to the statistical difference of activations between benign and poisoned data using nonparametric statistics technology. Extensive experiments are conducted on three public datasets: CIFAR10, GTSRB, and ImageNet, with three representative models. The results verify our approach's effectiveness and state-of-the-art performance, which achieve over 98% false rejection rate for different types of triggers.\",\"PeriodicalId\":106974,\"journal\":{\"name\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"435 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN55064.2022.9891900\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9891900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ACTSS: Input Detection Defense against Backdoor Attacks via Activation Subset Scanning
Deep neural networks are vulnerable to backdoor attacks where adversaries inject the trigger into partial training data to manipulate the trained model misclassification. In addition, the poisoned model behaves normally on clean inputs, and the malicious behavior only occurs when the secret trigger is present, making backdoor attacks hard to be detected. Most existing input detection methods leverage the link between triggers and outputs to reveal the poisoned inputs, which suffer from the trigger-size or the “all-to-all” attack scenario. We show that the internal activations produced by benign and poisoned inputs are significantly different in the poisoned model. In this paper, we propose a novel and run-time input detection algorithm, Activation Subset Scanning (ACTSS), which extracts the activations of incoming inputs and leverages an anomaly detection algorithm to identify malicious inputs. We search and score for the abnormal activation subset according to the statistical difference of activations between benign and poisoned data using nonparametric statistics technology. Extensive experiments are conducted on three public datasets: CIFAR10, GTSRB, and ImageNet, with three representative models. The results verify our approach's effectiveness and state-of-the-art performance, which achieve over 98% false rejection rate for different types of triggers.