Xueluan Gong, Yanjiao Chen, Wenbin Yang, Huayang Huang, Qian Wang
{"title":"B3:针对黑盒机器学习模型的后门攻击","authors":"Xueluan Gong, Yanjiao Chen, Wenbin Yang, Huayang Huang, Qian Wang","doi":"https://dl.acm.org/doi/10.1145/3605212","DOIUrl":null,"url":null,"abstract":"<p>Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users. </p><p>In this paper, from a malicious model provider perspective, we propose a black-box backdoor attack, named <span>B<sup>3</sup></span>, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective <i>model extraction</i> method that leverages a carefully-constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that <span>B<sup>3</sup></span> has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that <span>B<sup>3</sup></span> is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"124 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"B3: Backdoor Attacks Against Black-Box Machine Learning Models\",\"authors\":\"Xueluan Gong, Yanjiao Chen, Wenbin Yang, Huayang Huang, Qian Wang\",\"doi\":\"https://dl.acm.org/doi/10.1145/3605212\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users. </p><p>In this paper, from a malicious model provider perspective, we propose a black-box backdoor attack, named <span>B<sup>3</sup></span>, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective <i>model extraction</i> method that leverages a carefully-constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that <span>B<sup>3</sup></span> has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that <span>B<sup>3</sup></span> is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.</p>\",\"PeriodicalId\":56050,\"journal\":{\"name\":\"ACM Transactions on Privacy and Security\",\"volume\":\"124 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Privacy and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3605212\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Privacy and Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3605212","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
B3: Backdoor Attacks Against Black-Box Machine Learning Models
Backdoor attacks aim to inject backdoors to victim machine learning models during training time, such that the backdoored model maintains the prediction power of the original model towards clean inputs and misbehaves towards backdoored inputs with the trigger. The reason for backdoor attacks is that resource-limited users usually download sophisticated models from model zoos or query the models from MLaaS rather than training a model from scratch, thus a malicious third party has a chance to provide a backdoored model. In general, the more precious the model provided (i.e., models trained on rare datasets), the more popular it is with users.
In this paper, from a malicious model provider perspective, we propose a black-box backdoor attack, named B3, where neither the rare victim model (including the model architecture, parameters, and hyperparameters) nor the training data is available to the adversary. To facilitate backdoor attacks in the black-box scenario, we design a cost-effective model extraction method that leverages a carefully-constructed query dataset to steal the functionality of the victim model with a limited budget. As the trigger is key to successful backdoor attacks, we develop a novel trigger generation algorithm that intensifies the bond between the trigger and the targeted misclassification label through the neuron with the highest impact on the targeted label. Extensive experiments have been conducted on various simulated deep learning models and the commercial API of Alibaba Cloud Compute Service. We demonstrate that B3 has a high attack success rate and maintains high prediction accuracy for benign inputs. It is also shown that B3 is robust against state-of-the-art defense strategies against backdoor attacks, such as model pruning and NC.
期刊介绍:
ACM Transactions on Privacy and Security (TOPS) (formerly known as TISSEC) publishes high-quality research results in the fields of information and system security and privacy. Studies addressing all aspects of these fields are welcomed, ranging from technologies, to systems and applications, to the crafting of policies.