用于实时应用中欺骗性网站 URL 检测的混合深度学习技术

Journal of Electrical Systems and Information Technology Pub Date : 2024-01-24 DOI:10.1186/s43067-023-00128-8

Bridget C. Ujah-Ogbuagu, Oluwatobi Noah Akande, Emeka Ogbuju

{"title":"用于实时应用中欺骗性网站 URL 检测的混合深度学习技术","authors":"Bridget C. Ujah-Ogbuagu, Oluwatobi Noah Akande, Emeka Ogbuju","doi":"10.1186/s43067-023-00128-8","DOIUrl":null,"url":null,"abstract":"Website Uniform Resource Locator (URL) spoofing remains one of the ways of perpetrating phishing attacks in the twenty-first century. Hackers continue to employ URL spoofing to deceive naïve and unsuspecting consumers into releasing important personal details in malicious websites. Blacklists and rule-based filters that were once effective at reducing the risks and sophistication of phishing are no longer effective as there are over 1.5 million new phishing websites created monthly. Therefore, research aimed at unveiling new techniques for detecting phishing websites has sparked a lot of interest in both academics and business with machine and deep learning techniques being at the forefront. Among the deep learning techniques that have been employed, Convolutional Neural Network (CNN) remains one of the most widely used with high performance in feature learning. However, CNN has a problem of memorizing contextual relationships in URL text, which makes it challenging to efficiently detect sophisticated malicious URLs in real-time applications. On the contrary, Long Short-Term Memory (LSTM) deep learning model has been successfully employed in complex real-time problems because of its ability to store inputs for a long period of time. This study experiments with the use of hybrid CNN and LSTM deep learning models for spoofing website URL detection in order to exploit the combined strengths of the two approaches for a more sophisticated spoofing URL detection. Two publicly available datasets (UCL spoofing Website and PhishTank Datasets) were used to evaluate the performance of the proposed hybrid model against other models in the literature. The hybrid CNN-LSTM model achieved accuracies of 98.9% and 96.8%, respectively, when evaluated using the UCL and PhishTank datasets. On the other hand, the standalone CNN and LSTM achieved accuracies of 90.4% and 94.6% on the UCL dataset, while their accuracies on the PhishTank dataset were 89.3% and 92.6%, respectively. The results show that the hybrid CNN-LSTM algorithm largely outperformed the standalone CNN and LSTM models, which demonstrates a much better performance. Therefore, the hybrid deep learning technique is recommended for detecting spoofing website URL thereby reducing losses attributed to such attacks.","PeriodicalId":100777,"journal":{"name":"Journal of Electrical Systems and Information Technology","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid deep learning technique for spoofing website URL detection in real-time applications\",\"authors\":\"Bridget C. Ujah-Ogbuagu, Oluwatobi Noah Akande, Emeka Ogbuju\",\"doi\":\"10.1186/s43067-023-00128-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Website Uniform Resource Locator (URL) spoofing remains one of the ways of perpetrating phishing attacks in the twenty-first century. Hackers continue to employ URL spoofing to deceive naïve and unsuspecting consumers into releasing important personal details in malicious websites. Blacklists and rule-based filters that were once effective at reducing the risks and sophistication of phishing are no longer effective as there are over 1.5 million new phishing websites created monthly. Therefore, research aimed at unveiling new techniques for detecting phishing websites has sparked a lot of interest in both academics and business with machine and deep learning techniques being at the forefront. Among the deep learning techniques that have been employed, Convolutional Neural Network (CNN) remains one of the most widely used with high performance in feature learning. However, CNN has a problem of memorizing contextual relationships in URL text, which makes it challenging to efficiently detect sophisticated malicious URLs in real-time applications. On the contrary, Long Short-Term Memory (LSTM) deep learning model has been successfully employed in complex real-time problems because of its ability to store inputs for a long period of time. This study experiments with the use of hybrid CNN and LSTM deep learning models for spoofing website URL detection in order to exploit the combined strengths of the two approaches for a more sophisticated spoofing URL detection. Two publicly available datasets (UCL spoofing Website and PhishTank Datasets) were used to evaluate the performance of the proposed hybrid model against other models in the literature. The hybrid CNN-LSTM model achieved accuracies of 98.9% and 96.8%, respectively, when evaluated using the UCL and PhishTank datasets. On the other hand, the standalone CNN and LSTM achieved accuracies of 90.4% and 94.6% on the UCL dataset, while their accuracies on the PhishTank dataset were 89.3% and 92.6%, respectively. The results show that the hybrid CNN-LSTM algorithm largely outperformed the standalone CNN and LSTM models, which demonstrates a much better performance. Therefore, the hybrid deep learning technique is recommended for detecting spoofing website URL thereby reducing losses attributed to such attacks.\",\"PeriodicalId\":100777,\"journal\":{\"name\":\"Journal of Electrical Systems and Information Technology\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electrical Systems and Information Technology\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.1186/s43067-023-00128-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical Systems and Information Technology","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1186/s43067-023-00128-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

网站统一资源定位器（URL）欺骗仍然是二十一世纪实施网络钓鱼攻击的方法之一。黑客继续利用 URL 欺骗技术欺骗天真无邪的消费者，让他们在恶意网站上泄露重要的个人资料。黑名单和基于规则的过滤器曾经有效地降低了网络钓鱼的风险和复杂性，但现在已不再有效，因为每月都有超过 150 万个新的网络钓鱼网站出现。因此，旨在揭示检测网络钓鱼网站新技术的研究引发了学术界和企业界的浓厚兴趣，而机器学习和深度学习技术则是其中的佼佼者。在已采用的深度学习技术中，卷积神经网络（CNN）仍然是使用最广泛的技术之一，在特征学习方面具有很高的性能。然而，卷积神经网络在记忆 URL 文本中的上下文关系方面存在问题，这使得它在实时应用中高效检测复杂的恶意 URL 方面面临挑战。相反，长短期记忆（LSTM）深度学习模型由于能够长时间存储输入，已成功应用于复杂的实时问题。本研究尝试将混合 CNN 和 LSTM 深度学习模型用于欺骗性网站 URL 检测，以利用这两种方法的综合优势进行更复杂的欺骗性 URL 检测。两个公开数据集（UCL 欺骗网站数据集和 PhishTank 数据集）被用来评估所提出的混合模型与文献中其他模型的性能。在使用 UCL 和 PhishTank 数据集进行评估时，混合 CNN-LSTM 模型的准确率分别达到了 98.9% 和 96.8%。另一方面，独立的 CNN 和 LSTM 在 UCL 数据集上的准确率分别为 90.4% 和 94.6%，而在 PhishTank 数据集上的准确率分别为 89.3% 和 92.6%。结果表明，CNN-LSTM 混合算法在很大程度上优于独立的 CNN 和 LSTM 模型，表现出了更好的性能。因此，推荐使用混合深度学习技术来检测欺骗性网站 URL，从而减少此类攻击造成的损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A hybrid deep learning technique for spoofing website URL detection in real-time applications

Website Uniform Resource Locator (URL) spoofing remains one of the ways of perpetrating phishing attacks in the twenty-first century. Hackers continue to employ URL spoofing to deceive naïve and unsuspecting consumers into releasing important personal details in malicious websites. Blacklists and rule-based filters that were once effective at reducing the risks and sophistication of phishing are no longer effective as there are over 1.5 million new phishing websites created monthly. Therefore, research aimed at unveiling new techniques for detecting phishing websites has sparked a lot of interest in both academics and business with machine and deep learning techniques being at the forefront. Among the deep learning techniques that have been employed, Convolutional Neural Network (CNN) remains one of the most widely used with high performance in feature learning. However, CNN has a problem of memorizing contextual relationships in URL text, which makes it challenging to efficiently detect sophisticated malicious URLs in real-time applications. On the contrary, Long Short-Term Memory (LSTM) deep learning model has been successfully employed in complex real-time problems because of its ability to store inputs for a long period of time. This study experiments with the use of hybrid CNN and LSTM deep learning models for spoofing website URL detection in order to exploit the combined strengths of the two approaches for a more sophisticated spoofing URL detection. Two publicly available datasets (UCL spoofing Website and PhishTank Datasets) were used to evaluate the performance of the proposed hybrid model against other models in the literature. The hybrid CNN-LSTM model achieved accuracies of 98.9% and 96.8%, respectively, when evaluated using the UCL and PhishTank datasets. On the other hand, the standalone CNN and LSTM achieved accuracies of 90.4% and 94.6% on the UCL dataset, while their accuracies on the PhishTank dataset were 89.3% and 92.6%, respectively. The results show that the hybrid CNN-LSTM algorithm largely outperformed the standalone CNN and LSTM models, which demonstrates a much better performance. Therefore, the hybrid deep learning technique is recommended for detecting spoofing website URL thereby reducing losses attributed to such attacks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Electrical Systems and Information Technology

自引率

0.00%

发文量