Deep learning based phishing website identification system using CNN-LSTM classifier

IF 0.7 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE

JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES Pub Date : 2023-01-01 DOI:10.47974/jios-1343

Vinod Sapkal, Praveen Gupta, Aboo Bakar Khan

{"title":"Deep learning based phishing website identification system using CNN-LSTM classifier","authors":"Vinod Sapkal, Praveen Gupta, Aboo Bakar Khan","doi":"10.47974/jios-1343","DOIUrl":null,"url":null,"abstract":"The term phishing refers to an attack that pretends to be the website of a large corporation, typically one dealing with money, such as a bank or other financial institution or an online retailer. Its primary objective is to acquire personally identifiable information from users, such as their social security numbers, credit card information, and passwords. Due to the rise of phishing attacks, various techniques have been developed in order to combat these threats. One of these is deep learning algorithms, which are capable of learning and analyzing massive datasets. Due to their capabilities, these algorithms are very useful in identifying and preventing phishing attacks. Due to the complexity of the phishing websites, many development systems have been created to detect them. Unfortunately, the output that was desired cannot be achieved by these systems, and they have a number of other flaws as well. The purpose of this paper is to propose a hybrid deep learning-based phishing detection system that is easy to put into practice. The quality of the input dataset is improved through the process of preprocessing the dataset. After that, the procedures of clustering and feature selection are carried out in order to improve the accuracy and decrease the amount of time required for the processing. The resulting features are then fed into the CNN_LSTM, which is a classification system that classifies websites that are phishing and legitimate. Proposed Hybrid deep learning models are proposed to combine the features of natural language processing (NLP) and character embedding. They can then reveal high-level connections between characters. In terms of the metric that is being used for the evaluation, the performance of the models that have been proposed is better than that of the other models.","PeriodicalId":46518,"journal":{"name":"JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES","volume":"1 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47974/jios-1343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The term phishing refers to an attack that pretends to be the website of a large corporation, typically one dealing with money, such as a bank or other financial institution or an online retailer. Its primary objective is to acquire personally identifiable information from users, such as their social security numbers, credit card information, and passwords. Due to the rise of phishing attacks, various techniques have been developed in order to combat these threats. One of these is deep learning algorithms, which are capable of learning and analyzing massive datasets. Due to their capabilities, these algorithms are very useful in identifying and preventing phishing attacks. Due to the complexity of the phishing websites, many development systems have been created to detect them. Unfortunately, the output that was desired cannot be achieved by these systems, and they have a number of other flaws as well. The purpose of this paper is to propose a hybrid deep learning-based phishing detection system that is easy to put into practice. The quality of the input dataset is improved through the process of preprocessing the dataset. After that, the procedures of clustering and feature selection are carried out in order to improve the accuracy and decrease the amount of time required for the processing. The resulting features are then fed into the CNN_LSTM, which is a classification system that classifies websites that are phishing and legitimate. Proposed Hybrid deep learning models are proposed to combine the features of natural language processing (NLP) and character embedding. They can then reveal high-level connections between characters. In terms of the metric that is being used for the evaluation, the performance of the models that have been proposed is better than that of the other models.

查看原文本刊更多论文

基于CNN-LSTM分类器的深度学习网络钓鱼网站识别系统

网络钓鱼指的是假装是大型公司网站的攻击，通常是处理金钱的公司，如银行或其他金融机构或在线零售商。它的主要目标是从用户那里获取个人身份信息，例如他们的社会安全号码、信用卡信息和密码。由于网络钓鱼攻击的增加，为了对抗这些威胁，已经开发了各种技术。其中之一是深度学习算法，它能够学习和分析大量数据集。由于它们的功能，这些算法在识别和防止网络钓鱼攻击方面非常有用。由于网络钓鱼网站的复杂性，已经创建了许多开发系统来检测它们。不幸的是，这些系统无法实现期望的输出，而且它们还有许多其他缺陷。本文的目的是提出一种易于实施的基于深度学习的混合网络钓鱼检测系统。通过对数据集进行预处理，提高了输入数据集的质量。然后进行聚类和特征选择，以提高精度，减少处理所需的时间。然后将得到的特征输入到CNN_LSTM中，这是一个分类系统，可以对钓鱼网站和合法网站进行分类。将自然语言处理(NLP)和字符嵌入相结合，提出了混合深度学习模型。然后，它们可以揭示人物之间的高层联系。就用于评估的度量而言，已经提出的模型的性能比其他模型的性能要好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES INFORMATION SCIENCE & LIBRARY SCIENCE-

自引率

21.40%

发文量