Lei Xu, Sarah Alnegheimish, Laure Berti-Equille, Alfredo Cuesta-Infante, Kalyan Veeramachaneni
{"title":"Single Word Change Is All You Need: Using LLMs to Create Synthetic Training Examples for Text Classifiers","authors":"Lei Xu, Sarah Alnegheimish, Laure Berti-Equille, Alfredo Cuesta-Infante, Kalyan Veeramachaneni","doi":"10.1111/exsy.70079","DOIUrl":null,"url":null,"abstract":"<p>In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric <span></span><math>\n <semantics>\n <mrow>\n <mi>ρ</mi>\n </mrow>\n <annotation>$$ \\rho $$</annotation>\n </semantics></math> to quantitatively assess a classifier's <i>robustness against single-word perturbation</i>. (2) We present the <i>SP-Attack,</i> designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose <i>SP-Defence,</i> which aims to improve <span></span><math>\n <semantics>\n <mrow>\n <mi>ρ</mi>\n </mrow>\n <annotation>$$ \\rho $$</annotation>\n </semantics></math> by applying data augmentation in learning. Experimental results on 4 datasets and 2 masked language models show that SP-Defence improves <span></span><math>\n <semantics>\n <mrow>\n <mi>ρ</mi>\n </mrow>\n <annotation>$$ \\rho $$</annotation>\n </semantics></math> by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.</p>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 8","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.70079","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70079","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defence, which aims to improve by applying data augmentation in learning. Experimental results on 4 datasets and 2 masked language models show that SP-Defence improves by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.
在文本分类中,创建对抗性示例意味着在不改变句子含义的情况下巧妙地干扰句子中的几个单词,从而导致分类器对其进行错误分类。一个值得关注的观察是,现有方法生成的对抗性示例中有很大一部分只改变了一个单词。这个单词扰动漏洞代表了分类器的一个重要弱点,恶意用户可以利用它来有效地创建大量的对抗性示例。本文研究了这个问题,并做出了以下关键贡献:(1)我们引入了一个新的度量ρ $$ \rho $$来定量评估分类器对单词扰动的鲁棒性。(2)我们提出了sp攻击,旨在利用单字微扰漏洞,实现了更高的攻击成功率,更好地保留句子含义,同时与最先进的对抗方法相比,降低了计算成本。(3)我们提出了SP-Defence,旨在通过在学习中应用数据增强来提高ρ $$ \rho $$。在4个数据集和2个掩蔽语言模型上的实验结果表明,SP-Defence将ρ $$ \rho $$提高了14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.
期刊介绍:
Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper.
As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.