CBAs:针对中文预训练语言模型的字符级后门攻击

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Xinyu He, Fengrui Hao, Tianlong Gu, Liang Chang
{"title":"CBAs:针对中文预训练语言模型的字符级后门攻击","authors":"Xinyu He, Fengrui Hao, Tianlong Gu, Liang Chang","doi":"10.1145/3678007","DOIUrl":null,"url":null,"abstract":"The pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing researches on backdoor attacks have mainly focused on English PLMs, but paid less attention to the Chinese PLMs. Moreover, these extant backdoor attacks don’t work well against Chinese PLMs. In this paper, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure the backdoor being effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. Besides, CBAs also have very strong resistance against three state-of-the-art backdoor defense methods.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models\",\"authors\":\"Xinyu He, Fengrui Hao, Tianlong Gu, Liang Chang\",\"doi\":\"10.1145/3678007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing researches on backdoor attacks have mainly focused on English PLMs, but paid less attention to the Chinese PLMs. Moreover, these extant backdoor attacks don’t work well against Chinese PLMs. In this paper, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure the backdoor being effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. Besides, CBAs also have very strong resistance against three state-of-the-art backdoor defense methods.\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3678007\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3678007","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

预训练语言模型(PLMs)旨在协助各领域的计算机提供自然、高效的语言交互和文本处理能力。然而,最近的研究表明,PLMs 非常容易受到恶意后门攻击的影响,恶意后门攻击可将触发器注入模型,引导模型表现出攻击者预期的行为。遗憾的是,现有的后门攻击研究主要集中在英文版 PLM 上,对中文版 PLM 关注较少。此外,这些现有的后门攻击对中文 PLM 也不起作用。本文揭示了针对中文 PLM 的英文后门攻击的局限性,并提出了针对中文 PLM 的字符级后门攻击(CBA)。具体来说,我们首先设计了三种中文触发生成策略,以确保后门被有效触发,同时提高后门攻击的有效性。然后,根据攻击者获取训练数据集的能力,我们开发了目标标签相似度或屏蔽语言模型的触发器注入机制,选择最有影响力的位置插入触发器,最大限度地提高后门攻击的隐蔽性。在各种中文 PLM 和英文 PLM 中进行的三大自然语言处理任务的广泛实验证明了我们的方法的有效性和隐蔽性。此外,CBA 对三种最先进的后门防御方法也有很强的抵御能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models
The pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing researches on backdoor attacks have mainly focused on English PLMs, but paid less attention to the Chinese PLMs. Moreover, these extant backdoor attacks don’t work well against Chinese PLMs. In this paper, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure the backdoor being effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. Besides, CBAs also have very strong resistance against three state-of-the-art backdoor defense methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信