网络钓鱼URL检测中基于gan生成URL的效率研究

T. Pham, Thi Thanh Thuy Pham, Sy Tuong Hoang, Viet-Cuong Ta
{"title":"网络钓鱼URL检测中基于gan生成URL的效率研究","authors":"T. Pham, Thi Thanh Thuy Pham, Sy Tuong Hoang, Viet-Cuong Ta","doi":"10.1109/MAPR53640.2021.9585287","DOIUrl":null,"url":null,"abstract":"The URL (Uniform Resource Locator) is used to refer to the resources on the Internet by giving hyperlinks to the websites. Different resources are referenced by different network addresses or different URLs. As a result, embedding malware on websites by using malicious URLs is one of the most dangerous types of cyberattacks today and poses a serious threats to the safety of systems. In order to detect the phishing URLs, the most commonly used approach recently is using deep learning networks with a large number of URL samples, including both malign and benign ones for training the deep networks. However, the available URL databases have a modest number of samples. In addition, the disadvantage of these databases is the imbalance distribution of malicious and non-malicious URL strings. In fact, it is difficult to collect or update malicious URLs because these URLs only exist for a short time, after being detected they are changed again and again. In order to solve this challenge, in this work, we propose to train a GAN network named WGAN-GP for generating malicious URLs from the available phishing URL data. We then integrate the generated phishing URL data into the existing URL database and perform two URL classifiers of LSTM and GRU to give the comparative results. The experiments on different quantities of URL samples show the improvement for URL classification by using WGAN-GP and LSTM classifier.","PeriodicalId":233540,"journal":{"name":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Exploring Efficiency of GAN-based Generated URLs for Phishing URL Detection\",\"authors\":\"T. Pham, Thi Thanh Thuy Pham, Sy Tuong Hoang, Viet-Cuong Ta\",\"doi\":\"10.1109/MAPR53640.2021.9585287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The URL (Uniform Resource Locator) is used to refer to the resources on the Internet by giving hyperlinks to the websites. Different resources are referenced by different network addresses or different URLs. As a result, embedding malware on websites by using malicious URLs is one of the most dangerous types of cyberattacks today and poses a serious threats to the safety of systems. In order to detect the phishing URLs, the most commonly used approach recently is using deep learning networks with a large number of URL samples, including both malign and benign ones for training the deep networks. However, the available URL databases have a modest number of samples. In addition, the disadvantage of these databases is the imbalance distribution of malicious and non-malicious URL strings. In fact, it is difficult to collect or update malicious URLs because these URLs only exist for a short time, after being detected they are changed again and again. In order to solve this challenge, in this work, we propose to train a GAN network named WGAN-GP for generating malicious URLs from the available phishing URL data. We then integrate the generated phishing URL data into the existing URL database and perform two URL classifiers of LSTM and GRU to give the comparative results. The experiments on different quantities of URL samples show the improvement for URL classification by using WGAN-GP and LSTM classifier.\",\"PeriodicalId\":233540,\"journal\":{\"name\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MAPR53640.2021.9585287\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR53640.2021.9585287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

URL(统一资源定位符)用于通过提供指向网站的超链接来引用Internet上的资源。不同的资源被不同的网络地址或url所引用。因此,通过使用恶意url在网站上嵌入恶意软件是当今最危险的网络攻击类型之一,对系统安全构成严重威胁。为了检测网络钓鱼URL,目前最常用的方法是使用具有大量URL样本的深度学习网络来训练深度网络,包括恶意和良性URL样本。但是,可用的URL数据库的样本数量有限。此外,这些数据库的缺点是恶意和非恶意URL字符串的分布不平衡。实际上,由于恶意url存在的时间很短,被检测到后会不断更改,因此很难收集或更新恶意url。为了解决这一挑战,在这项工作中,我们提出训练一个名为WGAN-GP的GAN网络,用于从可用的网络钓鱼URL数据中生成恶意URL。然后,我们将生成的网络钓鱼URL数据集成到现有的URL数据库中,并执行LSTM和GRU两个URL分类器来给出比较结果。在不同数量的URL样本上进行的实验表明,使用WGAN-GP和LSTM分类器对URL分类有一定的改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring Efficiency of GAN-based Generated URLs for Phishing URL Detection
The URL (Uniform Resource Locator) is used to refer to the resources on the Internet by giving hyperlinks to the websites. Different resources are referenced by different network addresses or different URLs. As a result, embedding malware on websites by using malicious URLs is one of the most dangerous types of cyberattacks today and poses a serious threats to the safety of systems. In order to detect the phishing URLs, the most commonly used approach recently is using deep learning networks with a large number of URL samples, including both malign and benign ones for training the deep networks. However, the available URL databases have a modest number of samples. In addition, the disadvantage of these databases is the imbalance distribution of malicious and non-malicious URL strings. In fact, it is difficult to collect or update malicious URLs because these URLs only exist for a short time, after being detected they are changed again and again. In order to solve this challenge, in this work, we propose to train a GAN network named WGAN-GP for generating malicious URLs from the available phishing URL data. We then integrate the generated phishing URL data into the existing URL database and perform two URL classifiers of LSTM and GRU to give the comparative results. The experiments on different quantities of URL samples show the improvement for URL classification by using WGAN-GP and LSTM classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信