{"title":"Black-Box Universal Adversarial Attack on Text Classifiers","authors":"Yu Zhang, Kun Shao, Junan Yang, H. Liu","doi":"10.1109/ACCC54619.2021.00007","DOIUrl":null,"url":null,"abstract":"Adversarial examples reveal the fragility of deep learning models. Recent studies have shown that deep learning models are also vulnerable to universal adversarial perturbations. When the input-agnostic sequence of words concatenated to any input instance in the data set, it fools the model to produce a specific prediction in [9] and [10]. Despite being highly successful, they often need to obtain the gradient information of the target model. However, under more realistic black box conditions, we can only manipulate the input and output of the target model, which brings great difficulties to the search for universal adversarial disturbances. Therefore, to explore whether universal adversarial attacks can be realized under black-box conditions, we study a universal adversarial perturbation search method based on optimization. We conducted exhaustive experiments to prove the effectiveness of our attack model by attacking the Bi-LSTM and BERT models on sentiment classification tasks.","PeriodicalId":215546,"journal":{"name":"2021 2nd Asia Conference on Computers and Communications (ACCC)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd Asia Conference on Computers and Communications (ACCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCC54619.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Adversarial examples reveal the fragility of deep learning models. Recent studies have shown that deep learning models are also vulnerable to universal adversarial perturbations. When the input-agnostic sequence of words concatenated to any input instance in the data set, it fools the model to produce a specific prediction in [9] and [10]. Despite being highly successful, they often need to obtain the gradient information of the target model. However, under more realistic black box conditions, we can only manipulate the input and output of the target model, which brings great difficulties to the search for universal adversarial disturbances. Therefore, to explore whether universal adversarial attacks can be realized under black-box conditions, we study a universal adversarial perturbation search method based on optimization. We conducted exhaustive experiments to prove the effectiveness of our attack model by attacking the Bi-LSTM and BERT models on sentiment classification tasks.