Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms

IF 2.4 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Information Security Pub Date : 2024-05-13 DOI:10.1007/s10207-024-00861-9

Ashish Bajaj, Dinesh Kumar Vishwakarma

{"title":"Non-Alpha-Num: a novel architecture for generating adversarial examples for bypassing NLP-based clickbait detection mechanisms","authors":"Ashish Bajaj, Dinesh Kumar Vishwakarma","doi":"10.1007/s10207-024-00861-9","DOIUrl":null,"url":null,"abstract":"<p>The vast majority of online media rely heavily on the revenues generated by their readers’ views, and due to the abundance of such outlets, they must compete for reader attention. It is a common practise for publishers to employ attention-grabbing headlines as a means to entice users to visit their websites. These headlines, commonly referred to as clickbaits, strategically leverage the curiosity gap experienced by users, enticing them to click on hyperlinks that frequently fail to meet their expectations. Therefore, the identification of clickbaits is a significant NLP application. Previous studies have demonstrated that language models can effectively detect clickbaits. Deep learning models have attained great success in text-based assignments, but these are vulnerable to adversarial modifications. These attacks involve making undetectable alterations to a small number of words or characters in order to create a deceptive text that misleads the machine into making incorrect predictions. The present work introduces “<i>Non-Alpha-Num</i>”, a newly proposed textual adversarial assault that functions in a black box setting, operating at the character level. The primary goal is to manipulate a certain NLP model in a manner that the alterations made to the input data are undetectable by human observers. A series of comprehensive tests were conducted to evaluate the efficacy of the suggested attack approach on several widely-used models, including Word-CNN, BERT, DistilBERT, ALBERTA, RoBERTa, and XLNet. These models were fine-tuned using the clickbait dataset, which is commonly employed for clickbait detection purposes. The empirical evidence suggests that the attack model being offered routinely achieves much higher attack success rates (ASR) and produces high-quality adversarial instances in comparison to traditional adversarial manipulations. The findings suggest that the clickbait detection system has the potential to be circumvented, which might have significant implications for current policy efforts.</p>","PeriodicalId":50316,"journal":{"name":"International Journal of Information Security","volume":"200 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10207-024-00861-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The vast majority of online media rely heavily on the revenues generated by their readers’ views, and due to the abundance of such outlets, they must compete for reader attention. It is a common practise for publishers to employ attention-grabbing headlines as a means to entice users to visit their websites. These headlines, commonly referred to as clickbaits, strategically leverage the curiosity gap experienced by users, enticing them to click on hyperlinks that frequently fail to meet their expectations. Therefore, the identification of clickbaits is a significant NLP application. Previous studies have demonstrated that language models can effectively detect clickbaits. Deep learning models have attained great success in text-based assignments, but these are vulnerable to adversarial modifications. These attacks involve making undetectable alterations to a small number of words or characters in order to create a deceptive text that misleads the machine into making incorrect predictions. The present work introduces “Non-Alpha-Num”, a newly proposed textual adversarial assault that functions in a black box setting, operating at the character level. The primary goal is to manipulate a certain NLP model in a manner that the alterations made to the input data are undetectable by human observers. A series of comprehensive tests were conducted to evaluate the efficacy of the suggested attack approach on several widely-used models, including Word-CNN, BERT, DistilBERT, ALBERTA, RoBERTa, and XLNet. These models were fine-tuned using the clickbait dataset, which is commonly employed for clickbait detection purposes. The empirical evidence suggests that the attack model being offered routinely achieves much higher attack success rates (ASR) and produces high-quality adversarial instances in comparison to traditional adversarial manipulations. The findings suggest that the clickbait detection system has the potential to be circumvented, which might have significant implications for current policy efforts.

Abstract Image

查看原文本刊更多论文

Non-Alpha-Num：绕过基于 NLP 的点击诱饵检测机制生成对抗性示例的新型架构

绝大多数网络媒体在很大程度上依赖于读者浏览量带来的收入，而由于此类媒体数量众多，它们必须争夺读者的注意力。出版商通常会采用吸引眼球的标题来吸引用户访问其网站。这些标题通常被称为 "点击广告"（clickbaits），它们策略性地利用用户的好奇心缺口，吸引他们点击那些经常不符合他们期望的超链接。因此，识别点击广告是一项重要的 NLP 应用。以往的研究表明，语言模型可以有效地检测点击广告。深度学习模型在基于文本的任务中取得了巨大成功，但这些模型容易受到对抗性修改的影响。这些攻击涉及对少量单词或字符进行难以察觉的修改，以创建欺骗性文本，误导机器做出不正确的预测。本作品介绍的 "Non-Alpha-Num "是一种新提出的文本对抗攻击，它在黑盒环境下运行，在字符层面上操作。其主要目标是以人类观察者无法察觉的方式操纵特定的 NLP 模型，从而改变输入数据。我们进行了一系列综合测试，以评估建议的攻击方法在几个广泛使用的模型上的有效性，包括 Word-CNN、BERT、DistilBERT、ALBERTA、RoBERTa 和 XLNet。这些模型都使用了常用于点击诱饵检测的点击诱饵数据集进行了微调。经验证据表明，与传统的对抗操作相比，所提供的攻击模型通常能实现更高的攻击成功率（ASR），并产生高质量的对抗实例。研究结果表明，点击诱饵检测系统有可能被规避，这可能会对当前的政策努力产生重大影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Information Security 工程技术-计算机：理论方法

CiteScore

6.30

自引率

3.10%

发文量

审稿时长

12 months

期刊介绍： The International Journal of Information Security is an English language periodical on research in information security which offers prompt publication of important technical work, whether theoretical, applicable, or related to implementation. Coverage includes system security: intrusion detection, secure end systems, secure operating systems, database security, security infrastructures, security evaluation; network security: Internet security, firewalls, mobile security, security agents, protocols, anti-virus and anti-hacker measures; content protection: watermarking, software protection, tamper resistant software; applications: electronic commerce, government, health, telecommunications, mobility.