基于ML和dl的钓鱼网站检测:不同大小数据集和信息特征选择技术的影响

Journal of Artificial Intelligence and Technology Pub Date : 2023-09-30 DOI:10.37965/jait.2023.0269

Kibreab Adane, None Berhanu Beyene, None Mohammed Abebe

{"title":"基于ML和dl的钓鱼网站检测:不同大小数据集和信息特征选择技术的影响","authors":"Kibreab Adane, None Berhanu Beyene, None Mohammed Abebe","doi":"10.37965/jait.2023.0269","DOIUrl":null,"url":null,"abstract":"One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handover sensitive information. In this study, five ML and DL algorithms—CATB, GB, RF, MLP, and DNN—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier's performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.","PeriodicalId":135863,"journal":{"name":"Journal of Artificial Intelligence and Technology","volume":"92-D 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ML and DL-based Phishing Website Detection: The Effects of Varied Size Datasets and Informative Feature Selection Techniques\",\"authors\":\"Kibreab Adane, None Berhanu Beyene, None Mohammed Abebe\",\"doi\":\"10.37965/jait.2023.0269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handover sensitive information. In this study, five ML and DL algorithms—CATB, GB, RF, MLP, and DNN—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier's performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.\",\"PeriodicalId\":135863,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Technology\",\"volume\":\"92-D 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37965/jait.2023.0269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37965/jait.2023.0269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了使用互联网进行交流、团队合作和其他生产性活动，人们必须与特定的网页或网站进行交互。然而，由于网络钓鱼网站看起来是良性的，并不是所有的网站访问者都有相同的知识和技能来检查所访问网站的可信度，他们被欺骗，泄露敏感信息，使他们容易受到勒索软件等恶意软件的攻击。阻止攻击者创建网络钓鱼网站是不可能的，这是打击网络钓鱼的核心挑战之一。然而，这种威胁可以通过检测特定网站为网络钓鱼并提醒在线用户在移交敏感信息之前采取必要的预防措施来缓解。在这项研究中，五种机器学习和深度学习算法——catb、GB、RF、MLP和dnn——在三个不同的知名数据集和两种有用的特征选择技术上进行了测试，以评估每个分类器在不同数据集大小上的性能的可扩展性和一致性。实验结果表明，CATB分类器在所有数据集(DS-1、DS-2和DS-3)上的准确率最高，分别为97.9%、95.73%和98.83%。GB分类器在所有数据集(DS-1、DS-2和DS-3)中获得了第二好的准确率，分别为97.16%、95.18%和98.58%。MLP在所有数据集(DS-1、DS-2和DS-3)上获得了最佳的计算时间，其值分别为2,7和3秒，尽管在所有数据集上得分最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ML and DL-based Phishing Website Detection: The Effects of Varied Size Datasets and Informative Feature Selection Techniques

One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handover sensitive information. In this study, five ML and DL algorithms—CATB, GB, RF, MLP, and DNN—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier's performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Artificial Intelligence and Technology

自引率

0.00%

发文量