Single Word Change Is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems Pub Date : 2025-07-07 DOI:10.1111/exsy.70079

Lei Xu, Sarah Alnegheimish, Laure Berti-Equille, Alfredo Cuesta-Infante, Kalyan Veeramachaneni

{"title":"Single Word Change Is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers","authors":"Lei Xu, Sarah Alnegheimish, Laure Berti-Equille, Alfredo Cuesta-Infante, Kalyan Veeramachaneni","doi":"10.1111/exsy.70079","DOIUrl":null,"url":null,"abstract":"In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric <math>\n <semantics>\n <mrow>\n <mi>ρ</mi>\n </mrow>\n <annotation>$$ \\rho $$</annotation>\n </semantics></math> to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defence, which aims to improve <math>\n <semantics>\n <mrow>\n <mi>ρ</mi>\n </mrow>\n <annotation>$$ \\rho $$</annotation>\n </semantics></math> by applying data augmentation in learning. Experimental results on 4 datasets and 2 masked language models show that SP-Defence improves <math>\n <semantics>\n <mrow>\n <mi>ρ</mi>\n </mrow>\n <annotation>$$ \\rho $$</annotation>\n </semantics></math> by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 8","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.70079","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70079","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric $ρ$ to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defence, which aims to improve $ρ$ by applying data augmentation in learning. Experimental results on 4 datasets and 2 masked language models show that SP-Defence improves $ρ$ by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.

Abstract Image

查看原文本刊更多论文

你只需要改变一个单词：使用LLMs为文本分类器创建合成训练示例

在文本分类中，创建对抗性示例意味着在不改变句子含义的情况下巧妙地干扰句子中的几个单词，从而导致分类器对其进行错误分类。一个值得关注的观察是，现有方法生成的对抗性示例中有很大一部分只改变了一个单词。这个单词扰动漏洞代表了分类器的一个重要弱点，恶意用户可以利用它来有效地创建大量的对抗性示例。本文研究了这个问题，并做出了以下关键贡献：(1)我们引入了一个新的度量ρ $$ \rho $$来定量评估分类器对单词扰动的鲁棒性。(2)我们提出了sp攻击，旨在利用单字微扰漏洞，实现了更高的攻击成功率，更好地保留句子含义，同时与最先进的对抗方法相比，降低了计算成本。(3)我们提出了SP-Defence，旨在通过在学习中应用数据增强来提高ρ $$ \rho $$。在4个数据集和2个掩蔽语言模型上的实验结果表明，SP-Defence将ρ $$ \rho $$提高了14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems 工程技术-计算机：理论方法

CiteScore

7.40

自引率

6.10%

发文量

266

审稿时长

24 months

期刊介绍： Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper. As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.