MSDT: Masked Language Model Scoring Defense in Text Domain

Jaechul Roh, Minhao Cheng, Yajun Fang
{"title":"MSDT: Masked Language Model Scoring Defense in Text Domain","authors":"Jaechul Roh, Minhao Cheng, Yajun Fang","doi":"10.1109/UV56588.2022.10185524","DOIUrl":null,"url":null,"abstract":"Pre-trained language models allowed us to process downstream tasks with the help of fine-tuning, which aids the model to achieve fairly high accuracy in various Natural Language Processing (NLP) tasks. Such easily-downloaded language models from various websites empowered the public users as well as some major institutions to give a momentum to their real-life application. However, it was recently proven that models become extremely vulnerable when they are backdoor attacked with trigger-inserted poisoned datasets by malicious users. The attackers then redistribute the victim models to the public to attract other users to use them, where the models tend to misclassify when certain triggers are detected within the training sample. In this paper, we will introduce a novel improved textual backdoor defense method, named MSDT, that outperforms the current existing defensive algorithms in specific datasets. The experimental results illustrate that our method can be effective and constructive in terms of defending against backdoor attack in text domain.","PeriodicalId":211011,"journal":{"name":"2022 6th International Conference on Universal Village (UV)","volume":"633 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Universal Village (UV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UV56588.2022.10185524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Pre-trained language models allowed us to process downstream tasks with the help of fine-tuning, which aids the model to achieve fairly high accuracy in various Natural Language Processing (NLP) tasks. Such easily-downloaded language models from various websites empowered the public users as well as some major institutions to give a momentum to their real-life application. However, it was recently proven that models become extremely vulnerable when they are backdoor attacked with trigger-inserted poisoned datasets by malicious users. The attackers then redistribute the victim models to the public to attract other users to use them, where the models tend to misclassify when certain triggers are detected within the training sample. In this paper, we will introduce a novel improved textual backdoor defense method, named MSDT, that outperforms the current existing defensive algorithms in specific datasets. The experimental results illustrate that our method can be effective and constructive in terms of defending against backdoor attack in text domain.
文本域掩码语言模型评分防御
预训练的语言模型允许我们在微调的帮助下处理下游任务,这有助于模型在各种自然语言处理(NLP)任务中达到相当高的精度。这些易于从各种网站下载的语言模型使公众用户和一些主要机构能够推动它们在现实生活中的应用。然而,最近证明,当模型受到恶意用户通过触发插入的有毒数据集的后门攻击时,它们会变得非常脆弱。然后,攻击者将受害模型重新分发给公众,以吸引其他用户使用它们,当在训练样本中检测到某些触发因素时,模型往往会进行错误分类。在本文中,我们将介绍一种新的改进的文本后门防御方法,称为MSDT,它在特定数据集上优于现有的防御算法。实验结果表明,该方法在文本域防御后门攻击方面是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信