基于GAN的网络钓鱼URL分类器模型的评估

Q1 Mathematics
Thi Thanh Thuy Pham, T. Pham, Viet-Cuong Ta
{"title":"基于GAN的网络钓鱼URL分类器模型的评估","authors":"Thi Thanh Thuy Pham, T. Pham, Viet-Cuong Ta","doi":"10.5815/ijcnis.2023.02.01","DOIUrl":null,"url":null,"abstract":"Phishing attacks by malicious URL/web links are common nowadays. The user data, such as login credentials and credit card numbers can be stolen by their careless clicking on these links. Moreover, this can lead to installation of malware on the target systems to freeze their activities, perform ransomware attack or reveal sensitive information. Recently, GAN-based models have been attractive for anti-phishing URLs. The general motivation is using Generator network (G) to generate fake URL strings and Discriminator network (D) to distinguish the real and the fake URL samples. This is operated in adversarial way between G and D so that the synthesized URL samples by G become more and more similar to the real ones. From the perspective of cybersecurity defense, GAN-based motivation can be exploited for D as a phishing URL detector or classifier. This means after training GAN on both malign and benign URL strings, a strong classifier/detector D can be achieved. From the perspective of cyberattack, the attackers would like to to create fake URLs that are as close to the real ones as possible to perform phishing attacks. This makes them easier to fool users and detectors. In the related proposals, GAN-based models are mainly exploited for anti-phishing URLs. There have been no evaluations specific for GAN-generated fake URLs. The attacker can make use of these URL strings for phishing attacks. In this work, we propose to use TLD (Top-level Domain) and SSIM (Structural Similarity Index Score) scores for evaluation the GAN-synthesized URL strings in terms of the structural similariy with the real ones. The more similar in the structure of the GAN-generated URLs are to the real ones, the more likely they are to fool the classifiers. Different GAN models from basic GAN to others GAN extensions of DCGAN, WGAN, SEQGAN are explored in this work. We show from the intensive experiments that D classifier of basic GAN and DCGAN surpasses other GAN models of WGAN and SegGAN. The effectiveness of the fake URL patterns generated from SeqGAN is the best compared to other GAN models in both structural similarity and the ability in deceiving the phishing URL classifiers of LSTM (Long Short Term Memory) and RF (Random Forest).","PeriodicalId":36488,"journal":{"name":"International Journal of Computer Network and Information Security","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Evaluation of GAN-based Models for Phishing URL Classifiers\",\"authors\":\"Thi Thanh Thuy Pham, T. Pham, Viet-Cuong Ta\",\"doi\":\"10.5815/ijcnis.2023.02.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phishing attacks by malicious URL/web links are common nowadays. The user data, such as login credentials and credit card numbers can be stolen by their careless clicking on these links. Moreover, this can lead to installation of malware on the target systems to freeze their activities, perform ransomware attack or reveal sensitive information. Recently, GAN-based models have been attractive for anti-phishing URLs. The general motivation is using Generator network (G) to generate fake URL strings and Discriminator network (D) to distinguish the real and the fake URL samples. This is operated in adversarial way between G and D so that the synthesized URL samples by G become more and more similar to the real ones. From the perspective of cybersecurity defense, GAN-based motivation can be exploited for D as a phishing URL detector or classifier. This means after training GAN on both malign and benign URL strings, a strong classifier/detector D can be achieved. From the perspective of cyberattack, the attackers would like to to create fake URLs that are as close to the real ones as possible to perform phishing attacks. This makes them easier to fool users and detectors. In the related proposals, GAN-based models are mainly exploited for anti-phishing URLs. There have been no evaluations specific for GAN-generated fake URLs. The attacker can make use of these URL strings for phishing attacks. In this work, we propose to use TLD (Top-level Domain) and SSIM (Structural Similarity Index Score) scores for evaluation the GAN-synthesized URL strings in terms of the structural similariy with the real ones. The more similar in the structure of the GAN-generated URLs are to the real ones, the more likely they are to fool the classifiers. Different GAN models from basic GAN to others GAN extensions of DCGAN, WGAN, SEQGAN are explored in this work. We show from the intensive experiments that D classifier of basic GAN and DCGAN surpasses other GAN models of WGAN and SegGAN. The effectiveness of the fake URL patterns generated from SeqGAN is the best compared to other GAN models in both structural similarity and the ability in deceiving the phishing URL classifiers of LSTM (Long Short Term Memory) and RF (Random Forest).\",\"PeriodicalId\":36488,\"journal\":{\"name\":\"International Journal of Computer Network and Information Security\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Network and Information Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijcnis.2023.02.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Network and Information Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijcnis.2023.02.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 2

摘要

通过恶意URL/网页链接进行的网络钓鱼攻击十分常见。用户数据,如登录凭据和信用卡号码,可以通过他们不小心点击这些链接窃取。此外,这可能导致在目标系统上安装恶意软件,以冻结其活动,执行勒索软件攻击或泄露敏感信息。最近,基于gan的模型对反网络钓鱼url很有吸引力。一般动机是使用Generator网络(G)生成假URL字符串,使用Discriminator网络(D)区分真实和虚假URL样本。这在G和D之间以对抗的方式进行操作,使得G合成的URL样本越来越接近真实的URL样本。从网络安全防御的角度来看,基于gan的动机可以利用D作为网络钓鱼URL检测器或分类器。这意味着在对恶意和良性URL字符串进行GAN训练之后,可以实现强大的分类器/检测器D。从网络攻击的角度来看,攻击者希望创建尽可能接近真实url的假url来进行网络钓鱼攻击。这使得它们更容易欺骗用户和检测器。在相关提案中,基于gan的模型主要用于反钓鱼url。目前还没有针对gan生成的假url的具体评估。攻击者可以利用这些URL字符串进行网络钓鱼攻击。在这项工作中,我们建议使用TLD(顶级域)和SSIM(结构相似指数得分)分数来评估gan合成的URL字符串与真实字符串的结构相似度。gan生成的url的结构与真实的url越相似,它们就越有可能欺骗分类器。本文探讨了不同的GAN模型,从基本GAN到DCGAN、WGAN、SEQGAN的其他GAN扩展。我们从大量的实验中发现,基本GAN和DCGAN的D分类器优于WGAN和SegGAN的其他GAN模型。与其他GAN模型相比,SeqGAN生成的假URL模式在结构相似性和欺骗LSTM(长短期记忆)和RF(随机森林)的钓鱼URL分类器的能力方面都是最好的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of GAN-based Models for Phishing URL Classifiers
Phishing attacks by malicious URL/web links are common nowadays. The user data, such as login credentials and credit card numbers can be stolen by their careless clicking on these links. Moreover, this can lead to installation of malware on the target systems to freeze their activities, perform ransomware attack or reveal sensitive information. Recently, GAN-based models have been attractive for anti-phishing URLs. The general motivation is using Generator network (G) to generate fake URL strings and Discriminator network (D) to distinguish the real and the fake URL samples. This is operated in adversarial way between G and D so that the synthesized URL samples by G become more and more similar to the real ones. From the perspective of cybersecurity defense, GAN-based motivation can be exploited for D as a phishing URL detector or classifier. This means after training GAN on both malign and benign URL strings, a strong classifier/detector D can be achieved. From the perspective of cyberattack, the attackers would like to to create fake URLs that are as close to the real ones as possible to perform phishing attacks. This makes them easier to fool users and detectors. In the related proposals, GAN-based models are mainly exploited for anti-phishing URLs. There have been no evaluations specific for GAN-generated fake URLs. The attacker can make use of these URL strings for phishing attacks. In this work, we propose to use TLD (Top-level Domain) and SSIM (Structural Similarity Index Score) scores for evaluation the GAN-synthesized URL strings in terms of the structural similariy with the real ones. The more similar in the structure of the GAN-generated URLs are to the real ones, the more likely they are to fool the classifiers. Different GAN models from basic GAN to others GAN extensions of DCGAN, WGAN, SEQGAN are explored in this work. We show from the intensive experiments that D classifier of basic GAN and DCGAN surpasses other GAN models of WGAN and SegGAN. The effectiveness of the fake URL patterns generated from SeqGAN is the best compared to other GAN models in both structural similarity and the ability in deceiving the phishing URL classifiers of LSTM (Long Short Term Memory) and RF (Random Forest).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.10
自引率
0.00%
发文量
33
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信