Automatic phishing website detection and prevention model using transformer deep belief network

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computers & Security Pub Date : 2024-08-25 DOI:10.1016/j.cose.2024.104071

Amol Babaso Majgave , Nitin L. Gavankar

{"title":"Automatic phishing website detection and prevention model using transformer deep belief network","authors":"Amol Babaso Majgave , Nitin L. Gavankar","doi":"10.1016/j.cose.2024.104071","DOIUrl":null,"url":null,"abstract":"<div><p>In the digitally connected world cybersecurity is paramount, phishing where attackers pose as trusted entities to steal sensitive data, looms large. The proliferation of phishing attacks on the internet poses a substantial threat to individuals and organizations, compromising sensitive information and causing financial and reputational damage. This study's goal is to establish an automated system for the early detection and prevention of phishing websites, thereby enhancing online security and protecting users from cyber threats. This research initially employs One Hot Encoding (OHE) mechanism-based pre-processing mechanism that converts every URL string into a numerical vector with a particular dimension. This study utilizes two feature selection techniques which are transfer learning-based feature extraction using DarkNet19 and Variational Autoencoder (VAE) to select the value of the most important feature. The robust security mechanisms are presented to prevent phishing attacks and safeguard personal information on websites. List-based deep learning-based systems to prevent and detect phishing URLs more efficiently. The study proposes a transformer-based Deep Belief Network (TB-DBN), a veritable pre-trained deep transformer network model for phishing behaviour detection. A cross-validation technique with grid search hyper-parameter tuning based on the Intelligence Binary Bat Algorithm (IBBA) was designed using the proposed hybrid model. Predictions were made to classify the phishing URLs using a probabilistic estimation guided boosting classifier model and evaluate their performance in terms of accuracy, precision, recall, specificity, and F1- score. The risk level associated with the URL will be assessed based on various factors, such as the source's reputation, content analysis results, and behavioural anomalies. The computational complexity of DL model training is influenced by various factors, such as the model's complexity, the training data's size, and the optimization algorithm exploited, for training. The outcome demonstrates that tweaking variables increases the effectiveness of Python-based deep learning systems. The findings of the proposed method excel, achieving an accuracy of 99.4 %, precision of 99.2 %, recall of 99.3 %, and an F1-score of 99.2 %. This innovative automatic phishing website detection and prevention model, based on a Transformer-based Deep Belief Network, offers advanced accuracy and adaptability, strengthening cybersecurity measures to safeguard sensitive user information and mitigate the substantial threat of phishing attacks in the digitally connected world.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"147 ","pages":"Article 104071"},"PeriodicalIF":4.8000,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003766","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the digitally connected world cybersecurity is paramount, phishing where attackers pose as trusted entities to steal sensitive data, looms large. The proliferation of phishing attacks on the internet poses a substantial threat to individuals and organizations, compromising sensitive information and causing financial and reputational damage. This study's goal is to establish an automated system for the early detection and prevention of phishing websites, thereby enhancing online security and protecting users from cyber threats. This research initially employs One Hot Encoding (OHE) mechanism-based pre-processing mechanism that converts every URL string into a numerical vector with a particular dimension. This study utilizes two feature selection techniques which are transfer learning-based feature extraction using DarkNet19 and Variational Autoencoder (VAE) to select the value of the most important feature. The robust security mechanisms are presented to prevent phishing attacks and safeguard personal information on websites. List-based deep learning-based systems to prevent and detect phishing URLs more efficiently. The study proposes a transformer-based Deep Belief Network (TB-DBN), a veritable pre-trained deep transformer network model for phishing behaviour detection. A cross-validation technique with grid search hyper-parameter tuning based on the Intelligence Binary Bat Algorithm (IBBA) was designed using the proposed hybrid model. Predictions were made to classify the phishing URLs using a probabilistic estimation guided boosting classifier model and evaluate their performance in terms of accuracy, precision, recall, specificity, and F1- score. The risk level associated with the URL will be assessed based on various factors, such as the source's reputation, content analysis results, and behavioural anomalies. The computational complexity of DL model training is influenced by various factors, such as the model's complexity, the training data's size, and the optimization algorithm exploited, for training. The outcome demonstrates that tweaking variables increases the effectiveness of Python-based deep learning systems. The findings of the proposed method excel, achieving an accuracy of 99.4 %, precision of 99.2 %, recall of 99.3 %, and an F1-score of 99.2 %. This innovative automatic phishing website detection and prevention model, based on a Transformer-based Deep Belief Network, offers advanced accuracy and adaptability, strengthening cybersecurity measures to safeguard sensitive user information and mitigate the substantial threat of phishing attacks in the digitally connected world.

查看原文本刊更多论文

使用变压器深度信念网络的钓鱼网站自动检测和预防模型

在数字互联的世界中，网络安全至关重要，而网络钓鱼（攻击者冒充可信实体窃取敏感数据）则是其中的隐患。网络钓鱼攻击在互联网上的扩散对个人和组织构成了巨大威胁，不仅会泄露敏感信息，还会造成经济和名誉损失。本研究的目标是建立一个自动系统，用于早期检测和预防网络钓鱼网站，从而加强网络安全，保护用户免受网络威胁。本研究最初采用基于 One Hot Encoding（OHE）机制的预处理机制，将每个 URL 字符串转换为具有特定维度的数字向量。本研究采用了两种特征选择技术，即使用 DarkNet19 进行基于迁移学习的特征提取和变异自动编码器（VAE）来选择最重要的特征值。本研究提出了稳健的安全机制，以防止网络钓鱼攻击并保护网站上的个人信息。基于列表的深度学习系统能更有效地预防和检测网络钓鱼网址。该研究提出了一种基于变换器的深度信念网络（TB-DBN），这是一种名副其实的用于网络钓鱼行为检测的预训练深度变换器网络模型。利用所提出的混合模型，设计了一种基于智能二进制蝙蝠算法（IBBA）的网格搜索超参数调整交叉验证技术。利用概率估计引导的提升分类器模型对钓鱼网址进行了预测分类，并从准确度、精确度、召回率、特异性和 F1- 分数等方面评估了其性能。与 URL 相关的风险级别将根据各种因素进行评估，如来源的声誉、内容分析结果和行为异常。DL 模型训练的计算复杂度受多种因素的影响，如模型的复杂度、训练数据的大小以及训练时使用的优化算法。结果表明，对变量进行调整可提高基于 Python 的深度学习系统的有效性。所提方法的结果非常出色，准确率达到 99.4%，精确率达到 99.2%，召回率达到 99.3%，F1 分数达到 99.2%。这种基于变形器深度信念网络的创新型网络钓鱼网站自动检测和预防模型具有先进的准确性和适应性，可加强网络安全措施，保护用户敏感信息，减轻数字互联世界中网络钓鱼攻击的巨大威胁。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.