Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

International Joint Conference on Artificial Intelligence Pub Date : 2023-08-01 DOI:10.24963/ijcai.2023/569

Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen

{"title":"Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation","authors":"Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen","doi":"10.24963/ijcai.2023/569","DOIUrl":null,"url":null,"abstract":"Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Joint Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/ijcai.2023/569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.

查看原文本刊更多论文

基于退火遗传的介词替换文本垃圾样例生成

现代自然语言处理(NLP)模型对文本垃圾样本的敏感性不足。文本垃圾示例是大量修改的输入文本，这些文本对人类来说是无意义的，但不会改变模型的预测。先前的工作是通过迭代删除单词和用波束搜索确定删除顺序来生成垃圾样例。然而，产生的垃圾示例通常会导致模型置信度降低，有时会提供人类可读的文本。为了解决这些问题，我们提出了一种基于退火遗传的介词替换(AGPS)算法用于文本垃圾样本生成，该算法具有两个主要优点。首先，AGPS通过用无意义的介词代替输入词来制作垃圾文本样例，而不是直接删除它们，这对模型的置信度降低较小。其次，我们设计了一种退火遗传算法来优化单词替换优先级，使遗传算法(GA)能够以概率跳出局部最优。这对于实现更好的目标非常重要，例如，高单词修改率和高模型置信度。在5个流行数据集上的实验结果显示了AGPS与基线相比的优势，并揭示了一个事实:NLP模型并不能真正理解句子的语义，因为它们对无意义介词序列给出了相同的预测，甚至更高的置信度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Joint Conference on Artificial Intelligence

自引率

0.00%

发文量