文本骗子硬标签环境下的高效文本对抗攻击

IF 7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Dependable and Secure Computing Pub Date : 2024-07-01 DOI:10.1109/TDSC.2023.3339802

Hao Peng, Shixin Guo, Dandan Zhao, Xuhong Zhang, Jianmin Han, Shoulin Ji, Xing Yang, Ming-Hong Zhong

{"title":"文本骗子硬标签环境下的高效文本对抗攻击","authors":"Hao Peng, Shixin Guo, Dandan Zhao, Xuhong Zhang, Jianmin Han, Shoulin Ji, Xing Yang, Ming-Hong Zhong","doi":"10.1109/TDSC.2023.3339802","DOIUrl":null,"url":null,"abstract":"Designing a query-efficient attack strategy to generate high-quality adversarial examples under the hard-label black-box setting is a fundamental yet challenging problem, especially in natural language processing (NLP). The process of searching for adversarial examples has many uncertainties (e.g., an unknown impact on the target model's prediction of the added perturbation) when confidence scores cannot be accessed, which must be compensated for with a large number of queries. To address this issue, we propose TextCheater, a decision-based metaheuristic search method that performs a query-efficient textual adversarial attack task by prohibiting invalid searches. The strategies of multiple initialization points and Tabu search are also introduced to keep the search process from falling into a local optimum. We apply our approach to three state-of-the-art language models (i.e., BERT, wordLSTM, and wordCNN) across six benchmark datasets and eight real-world commercial sentiment analysis platforms/models. Furthermore, we evaluate the Robustly optimized BERT pretraining Approach (RoBERTa) and models that enhance their robustness by adversarial training on toxicity detection and text classification tasks. The results demonstrate that our method minimizes the number of queries required for crafting plausible adversarial text while outperforming existing attack methods in the attack success rate, fluency of output sentences, and similarity between the original text and its adversary.","PeriodicalId":13047,"journal":{"name":"IEEE Transactions on Dependable and Secure Computing","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting\",\"authors\":\"Hao Peng, Shixin Guo, Dandan Zhao, Xuhong Zhang, Jianmin Han, Shoulin Ji, Xing Yang, Ming-Hong Zhong\",\"doi\":\"10.1109/TDSC.2023.3339802\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing a query-efficient attack strategy to generate high-quality adversarial examples under the hard-label black-box setting is a fundamental yet challenging problem, especially in natural language processing (NLP). The process of searching for adversarial examples has many uncertainties (e.g., an unknown impact on the target model's prediction of the added perturbation) when confidence scores cannot be accessed, which must be compensated for with a large number of queries. To address this issue, we propose TextCheater, a decision-based metaheuristic search method that performs a query-efficient textual adversarial attack task by prohibiting invalid searches. The strategies of multiple initialization points and Tabu search are also introduced to keep the search process from falling into a local optimum. We apply our approach to three state-of-the-art language models (i.e., BERT, wordLSTM, and wordCNN) across six benchmark datasets and eight real-world commercial sentiment analysis platforms/models. Furthermore, we evaluate the Robustly optimized BERT pretraining Approach (RoBERTa) and models that enhance their robustness by adversarial training on toxicity detection and text classification tasks. The results demonstrate that our method minimizes the number of queries required for crafting plausible adversarial text while outperforming existing attack methods in the attack success rate, fluency of output sentences, and similarity between the original text and its adversary.\",\"PeriodicalId\":13047,\"journal\":{\"name\":\"IEEE Transactions on Dependable and Secure Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Dependable and Secure Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/TDSC.2023.3339802\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Dependable and Secure Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TDSC.2023.3339802","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

设计一种查询效率高的攻击策略，以便在硬标签黑盒设置下生成高质量的对抗示例，这是一个基本但极具挑战性的问题，尤其是在自然语言处理（NLP）领域。在无法获取置信度分数的情况下，搜索对抗示例的过程存在许多不确定性（例如，对目标模型预测添加扰动的未知影响），必须通过大量查询来弥补。为了解决这个问题，我们提出了基于决策的元启发式搜索方法 TextCheater，它通过禁止无效搜索来执行查询效率高的文本对抗攻击任务。我们还引入了多初始化点和 Tabu 搜索策略，以防止搜索过程陷入局部最优状态。我们在六个基准数据集和八个真实世界的商业情感分析平台/模型上将我们的方法应用于三种最先进的语言模型（即 BERT、wordLSTM 和 wordCNN）。此外，我们还评估了鲁棒优化的 BERT 预训练方法（RoBERTa），以及在毒性检测和文本分类任务中通过对抗训练增强鲁棒性的模型。结果表明，我们的方法最大限度地减少了制作可信的敌意文本所需的查询次数，同时在攻击成功率、输出句子的流畅性以及原始文本和敌意文本之间的相似性方面优于现有的攻击方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TextCheater: A Query-Efficient Textual Adversarial Attack in the Hard-Label Setting

Designing a query-efficient attack strategy to generate high-quality adversarial examples under the hard-label black-box setting is a fundamental yet challenging problem, especially in natural language processing (NLP). The process of searching for adversarial examples has many uncertainties (e.g., an unknown impact on the target model's prediction of the added perturbation) when confidence scores cannot be accessed, which must be compensated for with a large number of queries. To address this issue, we propose TextCheater, a decision-based metaheuristic search method that performs a query-efficient textual adversarial attack task by prohibiting invalid searches. The strategies of multiple initialization points and Tabu search are also introduced to keep the search process from falling into a local optimum. We apply our approach to three state-of-the-art language models (i.e., BERT, wordLSTM, and wordCNN) across six benchmark datasets and eight real-world commercial sentiment analysis platforms/models. Furthermore, we evaluate the Robustly optimized BERT pretraining Approach (RoBERTa) and models that enhance their robustness by adversarial training on toxicity detection and text classification tasks. The results demonstrate that our method minimizes the number of queries required for crafting plausible adversarial text while outperforming existing attack methods in the attack success rate, fluency of output sentences, and similarity between the original text and its adversary.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Dependable and Secure Computing 工程技术-计算机：软件工程

CiteScore

11.20

自引率

5.50%

发文量

354

审稿时长

9 months

期刊介绍： The "IEEE Transactions on Dependable and Secure Computing (TDSC)" is a prestigious journal that publishes high-quality, peer-reviewed research in the field of computer science, specifically targeting the development of dependable and secure computing systems and networks. This journal is dedicated to exploring the fundamental principles, methodologies, and mechanisms that enable the design, modeling, and evaluation of systems that meet the required levels of reliability, security, and performance. The scope of TDSC includes research on measurement, modeling, and simulation techniques that contribute to the understanding and improvement of system performance under various constraints. It also covers the foundations necessary for the joint evaluation, verification, and design of systems that balance performance, security, and dependability. By publishing archival research results, TDSC aims to provide a valuable resource for researchers, engineers, and practitioners working in the areas of cybersecurity, fault tolerance, and system reliability. The journal's focus on cutting-edge research ensures that it remains at the forefront of advancements in the field, promoting the development of technologies that are critical for the functioning of modern, complex systems.