Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security Pub Date : 2021-11-15 DOI:10.1145/3474369.3486871

A. Dionysiou, E. Athanasopoulos

{"title":"Unicode Evil: Evading NLP Systems Using Visual Similarities of Text Characters","authors":"A. Dionysiou, E. Athanasopoulos","doi":"10.1145/3474369.3486871","DOIUrl":null,"url":null,"abstract":"Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) machine to misbehave, i.e., misclassify a given input. In this paper, we propose EvilText, a general ATGF that successfully evades some of the most popular NLP machines by (efficiently) perturbing a given legitimate text, preserving at the same time the original text's semantics as well as human readability. Perturbations are based on visually similar classes of characters appearing in the unicode set. EvilText can be utilized from NLP services' operators for evaluating their systems security and robustness. Furthermore, EvilText outperforms the state-of-the-art ATGFs, in terms of: (a) effectiveness, (b) efficiency and (c) original text's semantics and human readability preservation. We evaluate EvilText on some of the most popular NLP systems used for sentiment analysis and toxic content detection. We further expand on the generality and transferability of our ATGF, while also exploring possible countermeasures for defending against our attacks. Surprisingly, naive defence mechanisms fail to mitigate our attacks; the only promising one being the restriction of unicode characters use. However, we argue that restricting the use of unicode characters imposes a significant trade-off between security and usability as almost all websites are heavily based on unicode support.","PeriodicalId":411057,"journal":{"name":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474369.3486871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) machine to misbehave, i.e., misclassify a given input. In this paper, we propose EvilText, a general ATGF that successfully evades some of the most popular NLP machines by (efficiently) perturbing a given legitimate text, preserving at the same time the original text's semantics as well as human readability. Perturbations are based on visually similar classes of characters appearing in the unicode set. EvilText can be utilized from NLP services' operators for evaluating their systems security and robustness. Furthermore, EvilText outperforms the state-of-the-art ATGFs, in terms of: (a) effectiveness, (b) efficiency and (c) original text's semantics and human readability preservation. We evaluate EvilText on some of the most popular NLP systems used for sentiment analysis and toxic content detection. We further expand on the generality and transferability of our ATGF, while also exploring possible countermeasures for defending against our attacks. Surprisingly, naive defence mechanisms fail to mitigate our attacks; the only promising one being the restriction of unicode characters use. However, we argue that restricting the use of unicode characters imposes a significant trade-off between security and usability as almost all websites are heavily based on unicode support.

查看原文本刊更多论文

Unicode之恶:利用文本字符的视觉相似性规避NLP系统

对抗性文本生成框架(ATGFs)旨在导致自然语言处理(NLP)机器行为不当，即对给定输入进行错误分类。在本文中，我们提出了EvilText，这是一个通用的ATGF，它通过(有效地)干扰给定的合法文本，成功地避开了一些最流行的NLP机器，同时保留了原始文本的语义和人类的可读性。扰动是基于在unicode集中出现的视觉上相似的字符类。NLP服务运营商可以利用EvilText来评估其系统的安全性和鲁棒性。此外，EvilText在以下方面优于最先进的ATGFs: (a)有效性，(b)效率和(c)原始文本的语义和人类可读性保存。我们在一些最流行的用于情感分析和有毒内容检测的NLP系统上评估了EvilText。我们进一步扩大了我们的ATGF的通用性和可转移性，同时也探讨了防御我们攻击的可能对策。令人惊讶的是，天真的防御机制无法减轻我们的攻击;唯一有希望的是限制unicode字符的使用。然而，我们认为限制使用unicode字符会在安全性和可用性之间造成重大的权衡，因为几乎所有的网站都严重依赖unicode支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security

自引率

0.00%

发文量