Adversarial Email Generation against Spam Detection Models through Feature Perturbation

2022 IEEE International Conference on Assured Autonomy (ICAA) Pub Date : 2022-03-01 DOI:10.1109/ICAA52185.2022.00019

Qi Cheng, Anyi Xu, Xiangyang Li, Leah Ding

{"title":"Adversarial Email Generation against Spam Detection Models through Feature Perturbation","authors":"Qi Cheng, Anyi Xu, Xiangyang Li, Leah Ding","doi":"10.1109/ICAA52185.2022.00019","DOIUrl":null,"url":null,"abstract":"Machine learning-based spam detection models learn from a set of labeled training data and detect spam emails after the training phase. We study a class of vulnerabilities of such detection models, where the attack can manipulate a trained model to misclassify maliciously crafted spam emails at the detection phase. However, very often feature extraction methods make it very difficult to translate the change in the feature space to that in the textual email space. This paper proposes a new attack method of making guided changes to text data by taking advantage of findings of generated adversarial examples that purposely modify the features representing an email. We study different feature extraction methods using various Natural Language Processing (NLP) techniques. We develop effective methods to translate adversarial perturbations in the feature space back to a set of “magic words”, or malicious words, in the text space, which can cause desirable misclassifications from the attacker’s perspective. We show that our attacks are effective across different datasets and various machine learning methods in white-box, gray-box, and black-box attack settings. Finally, we discuss preliminary exploration to counter such attacks. We hope our findings and analysis will allow future work to perform additional studies of defensive solutions against this new class of attacks.","PeriodicalId":206047,"journal":{"name":"2022 IEEE International Conference on Assured Autonomy (ICAA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Assured Autonomy (ICAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAA52185.2022.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Machine learning-based spam detection models learn from a set of labeled training data and detect spam emails after the training phase. We study a class of vulnerabilities of such detection models, where the attack can manipulate a trained model to misclassify maliciously crafted spam emails at the detection phase. However, very often feature extraction methods make it very difficult to translate the change in the feature space to that in the textual email space. This paper proposes a new attack method of making guided changes to text data by taking advantage of findings of generated adversarial examples that purposely modify the features representing an email. We study different feature extraction methods using various Natural Language Processing (NLP) techniques. We develop effective methods to translate adversarial perturbations in the feature space back to a set of “magic words”, or malicious words, in the text space, which can cause desirable misclassifications from the attacker’s perspective. We show that our attacks are effective across different datasets and various machine learning methods in white-box, gray-box, and black-box attack settings. Finally, we discuss preliminary exploration to counter such attacks. We hope our findings and analysis will allow future work to perform additional studies of defensive solutions against this new class of attacks.

查看原文本刊更多论文

基于特征扰动的针对垃圾邮件检测模型的对抗电子邮件生成

基于机器学习的垃圾邮件检测模型从一组标记的训练数据中学习，并在训练阶段后检测垃圾邮件。我们研究了此类检测模型的一类漏洞，其中攻击可以操纵训练模型在检测阶段对恶意制作的垃圾邮件进行错误分类。然而，通常情况下，特征提取方法很难将特征空间中的变化转化为文本电子邮件空间中的变化。本文提出了一种新的攻击方法，通过利用生成的对抗性示例的发现对文本数据进行指导性更改，这些示例故意修改代表电子邮件的特征。我们研究了使用各种自然语言处理(NLP)技术的不同特征提取方法。我们开发了有效的方法，将特征空间中的对抗性扰动转换回文本空间中的一组“魔法词”或恶意词，从攻击者的角度来看，这可能会导致理想的错误分类。我们表明，在白盒、灰盒和黑盒攻击设置下，我们的攻击在不同的数据集和各种机器学习方法上都是有效的。最后，我们讨论了应对此类攻击的初步探索。我们希望我们的发现和分析将允许未来的工作对这种新型攻击的防御解决方案进行额外的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Assured Autonomy (ICAA)

自引率

0.00%

发文量