Jonathan Li, Steven Pugh, Honghe Zhou, Lin Deng, J. Dehlinger, Suranjan Chakraborty
{"title":"Experimental Evaluation of Adversarial Attacks Against Natural Language Machine Learning Models","authors":"Jonathan Li, Steven Pugh, Honghe Zhou, Lin Deng, J. Dehlinger, Suranjan Chakraborty","doi":"10.1109/SERA57763.2023.10197813","DOIUrl":null,"url":null,"abstract":"Machine learning models are being increasingly relied on for many natural language processing tasks. However, these models are vulnerable to adversarial attacks, i.e., inputs designed to target models into making a wrong prediction. Among different methods of attacking a model, it is important to understand what attacks are effective, so that we can design countermeasures to protect the models. In this paper, we design and implement six adversarial attacks against natural language machine learning models. Then, we evaluate the effectiveness of these attacks using a fine-tuned distilled BERT model and 5,000 sample sentences from the SST-2 dataset. Our results indicate that the Word-replace attack affected the model the most, which reduces the F1-score of the model by 34%. The Word-delete attack is the least effective, but still reduces the model’s accuracy by 17%. Based on the experimental results, we discuss our insights and provide our recommendations for building robust natural language machine learning models.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning models are being increasingly relied on for many natural language processing tasks. However, these models are vulnerable to adversarial attacks, i.e., inputs designed to target models into making a wrong prediction. Among different methods of attacking a model, it is important to understand what attacks are effective, so that we can design countermeasures to protect the models. In this paper, we design and implement six adversarial attacks against natural language machine learning models. Then, we evaluate the effectiveness of these attacks using a fine-tuned distilled BERT model and 5,000 sample sentences from the SST-2 dataset. Our results indicate that the Word-replace attack affected the model the most, which reduces the F1-score of the model by 34%. The Word-delete attack is the least effective, but still reduces the model’s accuracy by 17%. Based on the experimental results, we discuss our insights and provide our recommendations for building robust natural language machine learning models.