A hybrid DNN-LSTM model for detecting phishing URLs.

IF 4.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computing & Applications Pub Date : 2023-01-01 Epub Date: 2021-08-08 DOI:10.1007/s00521-021-06401-z

Alper Ozcan, Cagatay Catal, Emrah Donmez, Behcet Senturk

{"title":"A hybrid DNN-LSTM model for detecting phishing URLs.","authors":"Alper Ozcan, Cagatay Catal, Emrah Donmez, Behcet Senturk","doi":"10.1007/s00521-021-06401-z","DOIUrl":null,"url":null,"abstract":"<p><p>Phishing is an attack targeting to imitate the official websites of corporations such as banks, e-commerce, financial institutions, and governmental institutions. Phishing websites aim to access and retrieve users' important information such as personal identification, social security number, password, e-mail, credit card, and other account information. Several anti-phishing techniques have been developed to cope with the increasing number of phishing attacks so far. Machine learning and particularly, deep learning algorithms are nowadays the most crucial techniques used to detect and prevent phishing attacks because of their strong learning abilities on massive datasets and their state-of-the-art results in many classification problems. Previously, two types of feature extraction techniques [i.e., character embedding-based and manual natural language processing (NLP) feature extraction] were used in isolation. However, researchers did not consolidate these features and therefore, the performance was not remarkable. Unlike previous works, our study presented an approach that utilizes both feature extraction techniques. We discussed how to combine these feature extraction techniques to fully utilize from the available data. This paper proposes hybrid deep learning models based on long short-term memory and deep neural network algorithms for detecting phishing uniform resource locator and evaluates the performance of the models on phishing datasets. The proposed hybrid deep learning models utilize both character embedding and NLP features, thereby simultaneously exploiting deep connections between characters and revealing NLP-based high-level connections. Experimental results showed that the proposed models achieve superior performance than the other phishing detection models in terms of accuracy metric.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"35 7","pages":"4957-4973"},"PeriodicalIF":4.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349600/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing & Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00521-021-06401-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/8/8 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Phishing is an attack targeting to imitate the official websites of corporations such as banks, e-commerce, financial institutions, and governmental institutions. Phishing websites aim to access and retrieve users' important information such as personal identification, social security number, password, e-mail, credit card, and other account information. Several anti-phishing techniques have been developed to cope with the increasing number of phishing attacks so far. Machine learning and particularly, deep learning algorithms are nowadays the most crucial techniques used to detect and prevent phishing attacks because of their strong learning abilities on massive datasets and their state-of-the-art results in many classification problems. Previously, two types of feature extraction techniques [i.e., character embedding-based and manual natural language processing (NLP) feature extraction] were used in isolation. However, researchers did not consolidate these features and therefore, the performance was not remarkable. Unlike previous works, our study presented an approach that utilizes both feature extraction techniques. We discussed how to combine these feature extraction techniques to fully utilize from the available data. This paper proposes hybrid deep learning models based on long short-term memory and deep neural network algorithms for detecting phishing uniform resource locator and evaluates the performance of the models on phishing datasets. The proposed hybrid deep learning models utilize both character embedding and NLP features, thereby simultaneously exploiting deep connections between characters and revealing NLP-based high-level connections. Experimental results showed that the proposed models achieve superior performance than the other phishing detection models in terms of accuracy metric.

Abstract Image

查看原文本刊更多论文

用于检测网络钓鱼 URL 的 DNN-LSTM 混合模型。

网络钓鱼是一种模仿银行、电子商务、金融机构和政府机构等企业官方网站的攻击行为。网络钓鱼网站旨在访问和检索用户的重要信息，如个人身份信息、社会保险号、密码、电子邮件、信用卡和其他账户信息。为应对日益增多的网络钓鱼攻击，迄今已开发出多种反网络钓鱼技术。机器学习，尤其是深度学习算法，因其在海量数据集上的强大学习能力以及在许多分类问题上的先进成果，成为当今用于检测和预防网络钓鱼攻击的最关键技术。以前，人们孤立地使用两种特征提取技术[即基于字符嵌入的特征提取和人工自然语言处理（NLP）特征提取]。然而，研究人员并没有对这些特征进行整合，因此性能并不显著。与之前的研究不同，我们的研究提出了一种同时使用两种特征提取技术的方法。我们讨论了如何将这些特征提取技术结合起来，以充分利用可用数据。本文提出了基于长短期记忆和深度神经网络算法的混合深度学习模型，用于检测网络钓鱼统一资源定位器，并评估了模型在网络钓鱼数据集上的性能。所提出的混合深度学习模型同时利用了字符嵌入和 NLP 特征，从而同时利用了字符之间的深层联系，并揭示了基于 NLP 的高层联系。实验结果表明，与其他网络钓鱼检测模型相比，所提出的模型在准确度指标上取得了更优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computing & Applications 工程技术-计算机：人工智能

CiteScore

11.40

自引率

8.30%

发文量

1280

审稿时长

6.9 months

期刊介绍： Neural Computing & Applications is an international journal which publishes original research and other information in the field of practical applications of neural computing and related techniques such as genetic algorithms, fuzzy logic and neuro-fuzzy systems. All items relevant to building practical systems are within its scope, including but not limited to: -adaptive computing- algorithms- applicable neural networks theory- applied statistics- architectures- artificial intelligence- benchmarks- case histories of innovative applications- fuzzy logic- genetic algorithms- hardware implementations- hybrid intelligent systems- intelligent agents- intelligent control systems- intelligent diagnostics- intelligent forecasting- machine learning- neural networks- neuro-fuzzy systems- pattern recognition- performance measures- self-learning systems- software simulations- supervised and unsupervised learning methods- system engineering and integration. Featured contributions fall into several categories: Original Articles, Review Articles, Book Reviews and Announcements.