MSDT: Masked Language Model Scoring Defense in Text Domain

2022 6th International Conference on Universal Village (UV) Pub Date : 2022-10-22 DOI:10.1109/UV56588.2022.10185524

Jaechul Roh, Minhao Cheng, Yajun Fang

引用次数: 1

Abstract

Pre-trained language models allowed us to process downstream tasks with the help of fine-tuning, which aids the model to achieve fairly high accuracy in various Natural Language Processing (NLP) tasks. Such easily-downloaded language models from various websites empowered the public users as well as some major institutions to give a momentum to their real-life application. However, it was recently proven that models become extremely vulnerable when they are backdoor attacked with trigger-inserted poisoned datasets by malicious users. The attackers then redistribute the victim models to the public to attract other users to use them, where the models tend to misclassify when certain triggers are detected within the training sample. In this paper, we will introduce a novel improved textual backdoor defense method, named MSDT, that outperforms the current existing defensive algorithms in specific datasets. The experimental results illustrate that our method can be effective and constructive in terms of defending against backdoor attack in text domain.

查看原文本刊更多论文

文本域掩码语言模型评分防御

预训练的语言模型允许我们在微调的帮助下处理下游任务，这有助于模型在各种自然语言处理(NLP)任务中达到相当高的精度。这些易于从各种网站下载的语言模型使公众用户和一些主要机构能够推动它们在现实生活中的应用。然而，最近证明，当模型受到恶意用户通过触发插入的有毒数据集的后门攻击时，它们会变得非常脆弱。然后，攻击者将受害模型重新分发给公众，以吸引其他用户使用它们，当在训练样本中检测到某些触发因素时，模型往往会进行错误分类。在本文中，我们将介绍一种新的改进的文本后门防御方法，称为MSDT，它在特定数据集上优于现有的防御算法。实验结果表明，该方法在文本域防御后门攻击方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 6th International Conference on Universal Village (UV)

自引率

0.00%

发文量