构建稳健的网络钓鱼检测系统:实证分析

Jehyun Lee, Pingxiao Ye, Ruofan Liu, D. Divakaran, M. Chan
{"title":"构建稳健的网络钓鱼检测系统:实证分析","authors":"Jehyun Lee, Pingxiao Ye, Ruofan Liu, D. Divakaran, M. Chan","doi":"10.14722/madweb.2020.23007","DOIUrl":null,"url":null,"abstract":"To tackle phishing attacks, recent research works have resorted to the application of machine learning (ML) algorithms, yielding promising results. Often, a binary classification model is trained on labeled datasets of benign and phishing URLs (and contents) obtained via crawling. While phishing classifiers have high accuracy (precision and recall), they, however, are also prone to adversarial attacks wherein an adversary tries to evade the ML-based classifier by mimicking (feature values of) benign web pages. Based on this observation, in our work, we propose a simple approach to build a robust phishing page detection system. Our detection system, based on voting, employs multiple models, such that each model is trained by inserting (controlled) noises in a subset of randomly selected features from the full feature set. We conduct comprehensive experiments using real datasets, and based on a number of evasive strategies, evaluate the robustness of, both, the traditional native ML model and our proposed detection system. The results demonstrate that our proposed system, on one hand, performs close to the native model when there is no adversarial attack, and on the other hand, is more robust against evasion attacks than the native model.","PeriodicalId":408238,"journal":{"name":"Proceedings 2020 Workshop on Measurements, Attacks, and Defenses for the Web","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Building Robust Phishing Detection System: an Empirical Analysis\",\"authors\":\"Jehyun Lee, Pingxiao Ye, Ruofan Liu, D. Divakaran, M. Chan\",\"doi\":\"10.14722/madweb.2020.23007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To tackle phishing attacks, recent research works have resorted to the application of machine learning (ML) algorithms, yielding promising results. Often, a binary classification model is trained on labeled datasets of benign and phishing URLs (and contents) obtained via crawling. While phishing classifiers have high accuracy (precision and recall), they, however, are also prone to adversarial attacks wherein an adversary tries to evade the ML-based classifier by mimicking (feature values of) benign web pages. Based on this observation, in our work, we propose a simple approach to build a robust phishing page detection system. Our detection system, based on voting, employs multiple models, such that each model is trained by inserting (controlled) noises in a subset of randomly selected features from the full feature set. We conduct comprehensive experiments using real datasets, and based on a number of evasive strategies, evaluate the robustness of, both, the traditional native ML model and our proposed detection system. The results demonstrate that our proposed system, on one hand, performs close to the native model when there is no adversarial attack, and on the other hand, is more robust against evasion attacks than the native model.\",\"PeriodicalId\":408238,\"journal\":{\"name\":\"Proceedings 2020 Workshop on Measurements, Attacks, and Defenses for the Web\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 2020 Workshop on Measurements, Attacks, and Defenses for the Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14722/madweb.2020.23007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2020 Workshop on Measurements, Attacks, and Defenses for the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/madweb.2020.23007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

为了解决网络钓鱼攻击,最近的研究工作已经求助于机器学习(ML)算法的应用,并产生了有希望的结果。通常,二元分类模型是在通过爬行获得的良性和钓鱼url(和内容)的标记数据集上进行训练的。虽然网络钓鱼分类器具有很高的准确性(精度和召回率),但它们也容易受到对抗性攻击,其中攻击者试图通过模仿良性网页的(特征值)来逃避基于ml的分类器。基于这一观察,在我们的工作中,我们提出了一种简单的方法来构建一个健壮的网络钓鱼页面检测系统。我们的检测系统基于投票,采用了多个模型,这样每个模型都是通过在从完整特征集中随机选择的特征子集中插入(受控)噪声来训练的。我们使用真实数据集进行了全面的实验,并基于一些规避策略,评估了传统的本机ML模型和我们提出的检测系统的鲁棒性。结果表明,在没有对抗性攻击的情况下,我们提出的系统的性能接近于本机模型,另一方面,对逃避攻击的鲁棒性比本机模型更强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Building Robust Phishing Detection System: an Empirical Analysis
To tackle phishing attacks, recent research works have resorted to the application of machine learning (ML) algorithms, yielding promising results. Often, a binary classification model is trained on labeled datasets of benign and phishing URLs (and contents) obtained via crawling. While phishing classifiers have high accuracy (precision and recall), they, however, are also prone to adversarial attacks wherein an adversary tries to evade the ML-based classifier by mimicking (feature values of) benign web pages. Based on this observation, in our work, we propose a simple approach to build a robust phishing page detection system. Our detection system, based on voting, employs multiple models, such that each model is trained by inserting (controlled) noises in a subset of randomly selected features from the full feature set. We conduct comprehensive experiments using real datasets, and based on a number of evasive strategies, evaluate the robustness of, both, the traditional native ML model and our proposed detection system. The results demonstrate that our proposed system, on one hand, performs close to the native model when there is no adversarial attack, and on the other hand, is more robust against evasion attacks than the native model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信