使用机器学习和深度学习模型检测网络钓鱼攻击

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) Pub Date : 2022-03-01 DOI:10.1109/CDMA54072.2022.00034

M. Aljabri, Samiha Mirza

{"title":"使用机器学习和深度学习模型检测网络钓鱼攻击","authors":"M. Aljabri, Samiha Mirza","doi":"10.1109/CDMA54072.2022.00034","DOIUrl":null,"url":null,"abstract":"Because of the fast expansion of internet users, phishing attacks have become a significant menace where the attacker poses as a trusted entity in order to steal sensitive data, causing reputational damage, loss of money, ransomware, or other malware infections. Intelligent techniques mainly Machine Learning (ML) and Deep Learning (D L) are increasingly applied in the field of cybersecurity due to their ability to learn from available data in order to extract useful insight and predict future events. The effectiveness of applying such intelligent approaches in detecting phishing web sites is investigated in this paper. We used two separate datasets and selected the highest correlated features which comprised of a combination of content-based, URL lexical-based, and domain-based features. A set of ML models were then applied, and a comparative performance evaluation was conducted. Results proved the importance of features selection in improving the models' performance. Furthermore, the results also aimed to identify the best features that influence the model in identifying phishing websites. For classification performance, Random Forest (RF) algorithm achieved the highest accuracy for both datasets.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Phishing Attacks Detection using Machine Learning and Deep Learning Models\",\"authors\":\"M. Aljabri, Samiha Mirza\",\"doi\":\"10.1109/CDMA54072.2022.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Because of the fast expansion of internet users, phishing attacks have become a significant menace where the attacker poses as a trusted entity in order to steal sensitive data, causing reputational damage, loss of money, ransomware, or other malware infections. Intelligent techniques mainly Machine Learning (ML) and Deep Learning (D L) are increasingly applied in the field of cybersecurity due to their ability to learn from available data in order to extract useful insight and predict future events. The effectiveness of applying such intelligent approaches in detecting phishing web sites is investigated in this paper. We used two separate datasets and selected the highest correlated features which comprised of a combination of content-based, URL lexical-based, and domain-based features. A set of ML models were then applied, and a comparative performance evaluation was conducted. Results proved the importance of features selection in improving the models' performance. Furthermore, the results also aimed to identify the best features that influence the model in identifying phishing websites. For classification performance, Random Forest (RF) algorithm achieved the highest accuracy for both datasets.\",\"PeriodicalId\":313042,\"journal\":{\"name\":\"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDMA54072.2022.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDMA54072.2022.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

由于互联网用户的快速扩张，网络钓鱼攻击已经成为一个重大的威胁，攻击者冒充一个受信任的实体，以窃取敏感数据，造成声誉损害，金钱损失，勒索软件或其他恶意软件感染。智能技术(主要是机器学习(ML)和深度学习(dl))在网络安全领域的应用越来越多，因为它们能够从可用数据中学习，以提取有用的见解并预测未来事件。本文研究了应用这种智能方法检测钓鱼网站的有效性。我们使用了两个独立的数据集，并选择了相关度最高的特征，这些特征包括基于内容的、基于URL词汇的和基于域的特征。然后应用了一组ML模型，并进行了性能比较评价。结果证明了特征选择对提高模型性能的重要性。此外，结果还旨在确定影响识别网络钓鱼网站模型的最佳特征。在分类性能方面，随机森林(RF)算法在两个数据集上都达到了最高的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Phishing Attacks Detection using Machine Learning and Deep Learning Models

Because of the fast expansion of internet users, phishing attacks have become a significant menace where the attacker poses as a trusted entity in order to steal sensitive data, causing reputational damage, loss of money, ransomware, or other malware infections. Intelligent techniques mainly Machine Learning (ML) and Deep Learning (D L) are increasingly applied in the field of cybersecurity due to their ability to learn from available data in order to extract useful insight and predict future events. The effectiveness of applying such intelligent approaches in detecting phishing web sites is investigated in this paper. We used two separate datasets and selected the highest correlated features which comprised of a combination of content-based, URL lexical-based, and domain-based features. A set of ML models were then applied, and a comparative performance evaluation was conducted. Results proved the importance of features selection in improving the models' performance. Furthermore, the results also aimed to identify the best features that influence the model in identifying phishing websites. For classification performance, Random Forest (RF) algorithm achieved the highest accuracy for both datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

自引率

0.00%

发文量