Directed adversarial sampling attacks on phishing detection

J. Comput. Secur. Pub Date : 2021-02-03 DOI:10.3233/JCS-191411

H. Shirazi, Bruhadeshwar Bezawada, I. Ray, Charles Anderson

{"title":"Directed adversarial sampling attacks on phishing detection","authors":"H. Shirazi, Bruhadeshwar Bezawada, I. Ray, Charles Anderson","doi":"10.3233/JCS-191411","DOIUrl":null,"url":null,"abstract":"Phishing websites trick honest users into believing that they interact with a legitimate website and capture sensitive information, such as user names, passwords, credit card numbers, and other personal information. Machine learning is a promising technique to distinguish between phishing and legitimate websites. However, machine learning approaches are susceptible to adversarial learning attacks where a phishing sample can bypass classifiers. Our experiments on publicly available datasets reveal that the phishing detection mechanisms are vulnerable to adversarial learning attacks. We investigate the robustness of machine learning-based phishing detection in the face of adversarial learning attacks. We propose a practical approach to simulate such attacks by generating adversarial samples through direct feature manipulation. To enhance the sample’s success probability, we describe a clustering approach that guides an attacker to select the best possible phishing samples that can bypass the classifier by appearing as legitimate samples. We define the notion of vulnerability level for each dataset that measures the number of features that can be manipulated and the cost for such manipulation. Further, we clustered phishing samples and showed that some clusters of samples are more likely to exhibit higher vulnerability levels than others. This helps an adversary identify the best candidates of phishing samples to generate adversarial samples at a lower cost. Our finding can be used to refine the dataset and develop better learning models to compensate for the weak samples in the training dataset.","PeriodicalId":142580,"journal":{"name":"J. Comput. Secur.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Comput. Secur.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/JCS-191411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Phishing websites trick honest users into believing that they interact with a legitimate website and capture sensitive information, such as user names, passwords, credit card numbers, and other personal information. Machine learning is a promising technique to distinguish between phishing and legitimate websites. However, machine learning approaches are susceptible to adversarial learning attacks where a phishing sample can bypass classifiers. Our experiments on publicly available datasets reveal that the phishing detection mechanisms are vulnerable to adversarial learning attacks. We investigate the robustness of machine learning-based phishing detection in the face of adversarial learning attacks. We propose a practical approach to simulate such attacks by generating adversarial samples through direct feature manipulation. To enhance the sample’s success probability, we describe a clustering approach that guides an attacker to select the best possible phishing samples that can bypass the classifier by appearing as legitimate samples. We define the notion of vulnerability level for each dataset that measures the number of features that can be manipulated and the cost for such manipulation. Further, we clustered phishing samples and showed that some clusters of samples are more likely to exhibit higher vulnerability levels than others. This helps an adversary identify the best candidates of phishing samples to generate adversarial samples at a lower cost. Our finding can be used to refine the dataset and develop better learning models to compensate for the weak samples in the training dataset.

查看原文本刊更多论文

网络钓鱼检测中的定向对抗性抽样攻击

网络钓鱼网站欺骗诚实的用户，让他们相信他们在与一个合法的网站互动，并获取敏感信息，如用户名、密码、信用卡号和其他个人信息。机器学习是一种很有前途的区分网络钓鱼和合法网站的技术。然而，机器学习方法容易受到对抗性学习攻击，其中网络钓鱼样本可以绕过分类器。我们在公开可用数据集上的实验表明，网络钓鱼检测机制容易受到对抗性学习攻击。我们研究了面对对抗性学习攻击时基于机器学习的网络钓鱼检测的鲁棒性。我们提出了一种实用的方法来模拟这种攻击，即通过直接特征操作生成对抗性样本。为了提高样本的成功概率，我们描述了一种聚类方法，该方法指导攻击者选择可能的最佳网络钓鱼样本，这些样本可以通过作为合法样本出现而绕过分类器。我们为每个数据集定义了漏洞级别的概念，该概念衡量了可以被操纵的特征的数量以及这种操纵的成本。此外，我们对网络钓鱼样本进行了聚类，并表明一些样本集群比其他样本更有可能表现出更高的漏洞级别。这有助于攻击者识别网络钓鱼样本的最佳候选，从而以较低的成本生成对抗性样本。我们的发现可以用来改进数据集，开发更好的学习模型来补偿训练数据集中的弱样本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Comput. Secur.

自引率

0.00%

发文量