{"title":"PDSMV3-DCRNN: A novel ensemble deep learning framework for enhancing phishing detection and URL extraction","authors":"Y. Bhanu Prasad , Venkatesulu Dondeti","doi":"10.1016/j.cose.2024.104123","DOIUrl":null,"url":null,"abstract":"<div><div>Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict \"phishing\" or \"benign\". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 s and achieving an optimal accuracy of 99.21%.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104123"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004280","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict "phishing" or "benign". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 s and achieving an optimal accuracy of 99.21%.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.