Improving Phishing Website Detection using a Hybrid Two-level Framework for Feature Selection and XGBoost Tuning

IF 0.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Luka Jovanovic;Dijana Jovanovic;Milos Antonijevic;Bosko Nikolic;Nebojsa Bacanin;Miodrag Zivkovic;Ivana Strumberger
{"title":"Improving Phishing Website Detection using a Hybrid Two-level Framework for Feature Selection and XGBoost Tuning","authors":"Luka Jovanovic;Dijana Jovanovic;Milos Antonijevic;Bosko Nikolic;Nebojsa Bacanin;Miodrag Zivkovic;Ivana Strumberger","doi":"10.13052/jwe1540-9589.2237","DOIUrl":null,"url":null,"abstract":"In the last few decades, the World Wide Web has become a necessity that offers numerous services to end users. The number of online transactions increases daily, as well as that of malicious actors. Machine learning plays a vital role in the majority of modern solutions. To further improve Web security, this paper proposes a hybrid approach based on the eXtreme Gradient Boosting (XGBoost) machine learning model optimized by an improved version of the well-known metaheuristics algorithm. In this research, the improved firefly algorithm is employed in the two-tier framework, which was also developed as part of the research, to perform both the feature selection and adjustment of the XGBoost hyper-parameters. The performance of the introduced hybrid model is evaluated against three instances of well-known publicly available phishing website datasets. The performance of novel introduced algorithms is additionally compared against cutting-edge metaheuristics that are utilized in the same framework. The first two datasets were provided by Mendeley Data, while the third was acquired from the University of California, Irvine machine learning repository. Additionally, the best performing models have been subjected to SHapley Additive exPlanations (SHAP) analysis to determine the impact of each feature on model decisions. The obtained results suggest that the proposed hybrid solution achieves a superior performance level in comparison to other approaches, and that it represents a perspective solution in the domain of web security.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 3","pages":"543-574"},"PeriodicalIF":0.7000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/10243554/10243555/10247501.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10247501/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 1

Abstract

In the last few decades, the World Wide Web has become a necessity that offers numerous services to end users. The number of online transactions increases daily, as well as that of malicious actors. Machine learning plays a vital role in the majority of modern solutions. To further improve Web security, this paper proposes a hybrid approach based on the eXtreme Gradient Boosting (XGBoost) machine learning model optimized by an improved version of the well-known metaheuristics algorithm. In this research, the improved firefly algorithm is employed in the two-tier framework, which was also developed as part of the research, to perform both the feature selection and adjustment of the XGBoost hyper-parameters. The performance of the introduced hybrid model is evaluated against three instances of well-known publicly available phishing website datasets. The performance of novel introduced algorithms is additionally compared against cutting-edge metaheuristics that are utilized in the same framework. The first two datasets were provided by Mendeley Data, while the third was acquired from the University of California, Irvine machine learning repository. Additionally, the best performing models have been subjected to SHapley Additive exPlanations (SHAP) analysis to determine the impact of each feature on model decisions. The obtained results suggest that the proposed hybrid solution achieves a superior performance level in comparison to other approaches, and that it represents a perspective solution in the domain of web security.
使用用于特征选择和XGBoost调整的混合两级框架改进钓鱼网站检测
在过去的几十年里,万维网已经成为向最终用户提供大量服务的必需品。在线交易的数量每天都在增加,恶意行为者的数量也在增加。机器学习在大多数现代解决方案中发挥着至关重要的作用。为了进一步提高Web安全性,本文提出了一种基于极限梯度提升(XGBoost)机器学习模型的混合方法,该模型通过著名元启发式算法的改进版本进行了优化。在本研究中,改进的萤火虫算法被用于双层框架中,该框架也是作为研究的一部分开发的,用于执行XGBoost超参数的特征选择和调整。针对三个已知的公开可用的钓鱼网站数据集实例,对引入的混合模型的性能进行了评估。此外,还将新引入的算法的性能与在同一框架中使用的尖端元启发式算法进行了比较。前两个数据集由Mendeley Data提供,第三个数据集来自加州大学欧文分校的机器学习库。此外,对性能最好的模型进行了SHapley加性预测(SHAP)分析,以确定每个特征对模型决策的影响。所获得的结果表明,与其他方法相比,所提出的混合解决方案实现了更高的性能水平,并且它代表了网络安全领域的一个前瞻性解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Web Engineering
Journal of Web Engineering 工程技术-计算机:理论方法
CiteScore
1.80
自引率
12.50%
发文量
62
审稿时长
9 months
期刊介绍: The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信