Masked Language Model Based Textual Adversarial Example Detection

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security Pub Date : 2023-04-18 DOI:10.1145/3579856.3590339

Xiaomei Zhang, Zhaoxi Zhang, Qi Zhong, Xufei Zheng, Yanjun Zhang, Shengshan Hu, L. Zhang

{"title":"Masked Language Model Based Textual Adversarial Example Detection","authors":"Xiaomei Zhang, Zhaoxi Zhang, Qi Zhong, Xufei Zheng, Yanjun Zhang, Shengshan Hu, L. Zhang","doi":"10.1145/3579856.3590339","DOIUrl":null,"url":null,"abstract":"Adversarial attacks are a serious threat to the reliable deployment of machine learning models in safety-critical applications. They can misguide current models to predict incorrectly by slightly modifying the inputs. Recently, substantial work has shown that adversarial examples tend to deviate from the underlying data manifold of normal examples, whereas pre-trained masked language models can fit the manifold of normal NLP data. To explore how to use the masked language model in adversarial detection, we propose a novel textual adversarial example detection method, namely Masked Language Model-based Detection (MLMD), which can produce clearly distinguishable signals between normal examples and adversarial examples by exploring the changes in manifolds induced by the masked language model. MLMD features a plug and play usage (i.e., no need to retrain the victim model) for adversarial defense and it is agnostic to classification tasks, victim model’s architectures, and to-be-defended attack methods. We evaluate MLMD on various benchmark textual datasets, widely studied machine learning models, and state-of-the-art (SOTA) adversarial attacks (in total 3*4*4 = 48 settings). Experimental results show that MLMD can achieve strong performance, with detection accuracy up to 0.984, 0.967, and 0.901 on AG-NEWS, IMDB, and SST-2 datasets, respectively. Additionally, MLMD is superior, or at least comparable to, the SOTA detection defenses in detection accuracy and F1 score. Among many defenses based on the off-manifold assumption of adversarial examples, this work offers a new angle for capturing the manifold change. The code for this work is openly accessible at https://github.com/mlmddetection/MLMDdetection.","PeriodicalId":156082,"journal":{"name":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579856.3590339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Adversarial attacks are a serious threat to the reliable deployment of machine learning models in safety-critical applications. They can misguide current models to predict incorrectly by slightly modifying the inputs. Recently, substantial work has shown that adversarial examples tend to deviate from the underlying data manifold of normal examples, whereas pre-trained masked language models can fit the manifold of normal NLP data. To explore how to use the masked language model in adversarial detection, we propose a novel textual adversarial example detection method, namely Masked Language Model-based Detection (MLMD), which can produce clearly distinguishable signals between normal examples and adversarial examples by exploring the changes in manifolds induced by the masked language model. MLMD features a plug and play usage (i.e., no need to retrain the victim model) for adversarial defense and it is agnostic to classification tasks, victim model’s architectures, and to-be-defended attack methods. We evaluate MLMD on various benchmark textual datasets, widely studied machine learning models, and state-of-the-art (SOTA) adversarial attacks (in total 3*4*4 = 48 settings). Experimental results show that MLMD can achieve strong performance, with detection accuracy up to 0.984, 0.967, and 0.901 on AG-NEWS, IMDB, and SST-2 datasets, respectively. Additionally, MLMD is superior, or at least comparable to, the SOTA detection defenses in detection accuracy and F1 score. Among many defenses based on the off-manifold assumption of adversarial examples, this work offers a new angle for capturing the manifold change. The code for this work is openly accessible at https://github.com/mlmddetection/MLMDdetection.

查看原文本刊更多论文

基于掩码语言模型的文本对抗样本检测

对抗性攻击是在安全关键应用中可靠部署机器学习模型的严重威胁。它们可以通过稍微修改输入来误导当前模型做出不正确的预测。最近，大量的研究表明，对抗性示例倾向于偏离正常示例的底层数据流形，而预训练的屏蔽语言模型可以拟合正常NLP数据的流形。为了探索如何将掩码语言模型用于对抗检测，我们提出了一种新的文本对抗样本检测方法，即基于掩码语言模型的检测(mask language model -based detection, MLMD)，该方法通过探索掩码语言模型引起的流形变化，可以在正常样本和对抗样本之间产生清晰的区分信号。MLMD的特点是即插即用(即，不需要重新训练受害者模型)的对抗性防御，它是不可知的分类任务，受害者模型的架构，和被防御的攻击方法。我们在各种基准文本数据集、广泛研究的机器学习模型和最先进的(SOTA)对抗性攻击(总共3*4*4 = 48个设置)上评估MLMD。实验结果表明，MLMD在AG-NEWS、IMDB和SST-2数据集上的检测准确率分别达到0.984、0.967和0.901，具有较强的检测性能。此外，MLMD在检测精度和F1分数方面优于SOTA检测防御，或至少与之相当。在许多基于对抗性例子的非流形假设的防御中，这项工作为捕捉流形变化提供了一个新的角度。这项工作的代码可以在https://github.com/mlmddetection/MLMDdetection上公开访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security

自引率

0.00%

发文量