PDSMV3-DCRNN：用于增强网络钓鱼检测和 URL 提取的新型集合深度学习框架

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-09-17 DOI:10.1016/j.cose.2024.104123

Y. Bhanu Prasad , Venkatesulu Dondeti

{"title":"PDSMV3-DCRNN：用于增强网络钓鱼检测和 URL 提取的新型集合深度学习框架","authors":"Y. Bhanu Prasad , Venkatesulu Dondeti","doi":"10.1016/j.cose.2024.104123","DOIUrl":null,"url":null,"abstract":"<div><div>Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict \"phishing\" or \"benign\". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 s and achieving an optimal accuracy of 99.21%.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104123"},"PeriodicalIF":5.4000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PDSMV3-DCRNN: A novel ensemble deep learning framework for enhancing phishing detection and URL extraction\",\"authors\":\"Y. Bhanu Prasad , Venkatesulu Dondeti\",\"doi\":\"10.1016/j.cose.2024.104123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict \\\"phishing\\\" or \\\"benign\\\". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 s and achieving an optimal accuracy of 99.21%.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"148 \",\"pages\":\"Article 104123\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004280\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004280","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

网络钓鱼是一种利用受害者在技术上的无知或天真而实施的网络攻击，通常涉及统一资源定位符 (URL)。因此，在访问 URL 之前对其进行检查有助于识别网络钓鱼攻击。目前已经提出了几种基于机器学习的算法来检测网络钓鱼企图。但是，这些方法通常性能较低，如准确率较低、响应时间较长和误报率较高。此外，许多现有方法严重依赖于预定义的特征集，这可能会限制其适应性和鲁棒性。相比之下，我们提出的方法利用了一种更动态的特征选择过程，其中包括用于解决数据不平衡问题的条件瓦瑟斯坦生成对抗网络（CWGAN）和用于优化特征选择的二进制灰鹅优化算法（BGGOA）。这种动态方法增强了模型适应不同数据特征的能力，从而提高了检测性能。所提出的解决方案分为两个阶段：预部署和部署。在部署前阶段，对数据集进行预处理，包括数据转换、处理无关数据和冗余数据，并确保数据平衡。使用 CWGAN 增加少数样本，以避免类不平衡。然后使用 BGGOA 选择特征，得到一个特征还原数据集，用于训练和测试集合深度学习分类器，特别是新颖的 Pyramid Depth-wise Separable-MobileNetV3 （PyDS-MV3）和可变形卷积残差神经网络（DCRNN），称为 PDSMV3-DCRNN。在部署阶段，Boosted ConvNeXt 方法提取 URL 特征，输入训练有素的分类器，以预测 "网络钓鱼 "或 "良性"。根据实验结果，所提出的解决方案优于所有其他测试方法，其训练时间更短，仅为 0.11 秒，最佳准确率达到 99.21%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PDSMV3-DCRNN: A novel ensemble deep learning framework for enhancing phishing detection and URL extraction

Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict "phishing" or "benign". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 s and achieving an optimal accuracy of 99.21%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.