Syntactic paraphrase-based synthetic data generation for backdoor attacks against Chinese language models

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-06-12 DOI:10.1016/j.inffus.2025.103376

Man Hu , Yatao Yang , Deng Pan , Zhongliang Guo , Luwei Xiao , Deyu Lin , Shuai Zhao

{"title":"Syntactic paraphrase-based synthetic data generation for backdoor attacks against Chinese language models","authors":"Man Hu , Yatao Yang , Deng Pan , Zhongliang Guo , Luwei Xiao , Deyu Lin , Shuai Zhao","doi":"10.1016/j.inffus.2025.103376","DOIUrl":null,"url":null,"abstract":"<div><div>Language Models (LMs) have shown significant advancements in various Natural Language Processing (NLP) tasks. However, recent studies indicate that LMs are particularly susceptible to malicious backdoor attacks, where attackers manipulate the models to exhibit specific behaviors when they encounter particular triggers. While existing research has focused on backdoor attacks against English LMs, Chinese LMs remain largely unexplored. Moreover, existing backdoor attacks against Chinese LMs exhibit limited stealthiness. In this paper, we investigate the high detectability of current backdoor attacks against Chinese LMs and propose a more stealthy backdoor attack method based on syntactic paraphrasing. Specifically, we leverage large language models (LLMs) to construct a syntactic paraphrasing mechanism that transforms benign inputs into poisoned samples with predefined syntactic structures. Subsequently, we exploit the syntactic structures of these poisoned samples as triggers to create more stealthy and robust backdoor attacks across various attack strategies. Extensive experiments conducted on three major NLP tasks with various Chinese PLMs and LLMs demonstrate that our method can achieve comparable attack performance (almost 100% success rate). Additionally, the poisoned samples generated by our method show lower perplexity and fewer grammatical errors compared to traditional character-level backdoor attacks. Furthermore, our method exhibits strong resistance against two state-of-the-art backdoor defense mechanisms.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103376"},"PeriodicalIF":14.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156625352500449X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Language Models (LMs) have shown significant advancements in various Natural Language Processing (NLP) tasks. However, recent studies indicate that LMs are particularly susceptible to malicious backdoor attacks, where attackers manipulate the models to exhibit specific behaviors when they encounter particular triggers. While existing research has focused on backdoor attacks against English LMs, Chinese LMs remain largely unexplored. Moreover, existing backdoor attacks against Chinese LMs exhibit limited stealthiness. In this paper, we investigate the high detectability of current backdoor attacks against Chinese LMs and propose a more stealthy backdoor attack method based on syntactic paraphrasing. Specifically, we leverage large language models (LLMs) to construct a syntactic paraphrasing mechanism that transforms benign inputs into poisoned samples with predefined syntactic structures. Subsequently, we exploit the syntactic structures of these poisoned samples as triggers to create more stealthy and robust backdoor attacks across various attack strategies. Extensive experiments conducted on three major NLP tasks with various Chinese PLMs and LLMs demonstrate that our method can achieve comparable attack performance (almost 100% success rate). Additionally, the poisoned samples generated by our method show lower perplexity and fewer grammatical errors compared to traditional character-level backdoor attacks. Furthermore, our method exhibits strong resistance against two state-of-the-art backdoor defense mechanisms.

查看原文本刊更多论文

基于句法释义的汉语模型后门攻击合成数据生成

语言模型（LMs）在各种自然语言处理（NLP）任务中显示出显著的进步。然而，最近的研究表明，LMs特别容易受到恶意后门攻击，攻击者在遇到特定触发器时操纵模型以显示特定行为。虽然现有的研究主要集中在针对英语LMs的后门攻击上，但中文LMs在很大程度上仍未被探索。此外，针对中国LMs的现有后门攻击表现出有限的隐蔽性。本文研究了当前针对中文lm的后门攻击的高可检测性，并提出了一种基于句法释义的更隐蔽的后门攻击方法。具体来说，我们利用大型语言模型（llm）来构建一种语法释义机制，将良性输入转换为具有预定义语法结构的有毒样本。随后，我们利用这些中毒样本的语法结构作为触发器，在各种攻击策略中创建更隐蔽和健壮的后门攻击。在使用各种中文plm和llm的三个主要NLP任务上进行的大量实验表明，我们的方法可以达到相当的攻击性能（几乎100%的成功率）。此外，与传统的字符级后门攻击相比，我们的方法生成的中毒样本显示出更低的困惑度和更少的语法错误。此外，我们的方法对两种最先进的后门防御机制具有很强的抵抗力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.