Kibreab Adane, None Berhanu Beyene, None Mohammed Abebe
{"title":"基于ML和dl的钓鱼网站检测:不同大小数据集和信息特征选择技术的影响","authors":"Kibreab Adane, None Berhanu Beyene, None Mohammed Abebe","doi":"10.37965/jait.2023.0269","DOIUrl":null,"url":null,"abstract":"One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handover sensitive information. In this study, five ML and DL algorithms—CATB, GB, RF, MLP, and DNN—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier's performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.","PeriodicalId":135863,"journal":{"name":"Journal of Artificial Intelligence and Technology","volume":"92-D 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ML and DL-based Phishing Website Detection: The Effects of Varied Size Datasets and Informative Feature Selection Techniques\",\"authors\":\"Kibreab Adane, None Berhanu Beyene, None Mohammed Abebe\",\"doi\":\"10.37965/jait.2023.0269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handover sensitive information. In this study, five ML and DL algorithms—CATB, GB, RF, MLP, and DNN—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier's performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.\",\"PeriodicalId\":135863,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Technology\",\"volume\":\"92-D 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37965/jait.2023.0269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37965/jait.2023.0269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ML and DL-based Phishing Website Detection: The Effects of Varied Size Datasets and Informative Feature Selection Techniques
One must interact with a specific webpage or website in order to use the Internet for communication, teamwork, and other productive activities. However, because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites, they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware. It is impossible to stop attackers from creating phishing websites, which is one of the core challenges in combating them. However, this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handover sensitive information. In this study, five ML and DL algorithms—CATB, GB, RF, MLP, and DNN—were tested with three different reputable datasets and two useful feature selection techniques, to assess the scalability and consistency of each classifier's performance on varied dataset sizes. The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.9%, 95.73%, and 98.83%. The GB classifier achieved the second-best accuracy across all datasets (DS-1, DS-2, and DS-3) with respective values of 97.16%, 95.18%, and 98.58%. MLP achieved the best computational time across all datasets (DS-1, DS-2, and DS-3) with respective values of 2, 7, and 3 seconds despite scoring the lowest accuracy across all datasets.