Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen
{"title":"Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation","authors":"Chen Li, Xinghao Yang, Baodi Liu, Weifeng Liu, Honglong Chen","doi":"10.24963/ijcai.2023/569","DOIUrl":null,"url":null,"abstract":"Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Joint Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/ijcai.2023/569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i.e., a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.
基于退火遗传的介词替换文本垃圾样例生成
现代自然语言处理(NLP)模型对文本垃圾样本的敏感性不足。文本垃圾示例是大量修改的输入文本,这些文本对人类来说是无意义的,但不会改变模型的预测。先前的工作是通过迭代删除单词和用波束搜索确定删除顺序来生成垃圾样例。然而,产生的垃圾示例通常会导致模型置信度降低,有时会提供人类可读的文本。为了解决这些问题,我们提出了一种基于退火遗传的介词替换(AGPS)算法用于文本垃圾样本生成,该算法具有两个主要优点。首先,AGPS通过用无意义的介词代替输入词来制作垃圾文本样例,而不是直接删除它们,这对模型的置信度降低较小。其次,我们设计了一种退火遗传算法来优化单词替换优先级,使遗传算法(GA)能够以概率跳出局部最优。这对于实现更好的目标非常重要,例如,高单词修改率和高模型置信度。在5个流行数据集上的实验结果显示了AGPS与基线相比的优势,并揭示了一个事实:NLP模型并不能真正理解句子的语义,因为它们对无意义介词序列给出了相同的预测,甚至更高的置信度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信