{"title":"网络钓鱼URL检测中集成学习与非集成机器学习算法的比较分析","authors":"Chiamaka M. Igwilo, Victor Odumuyiwa","doi":"10.46792/fuoyejet.v7i3.807","DOIUrl":null,"url":null,"abstract":"Phishing is a social engineering attack that has been perpetuated for long and is still a prominent attack with an attending high number of victims. Through phishing, attackers can gain easy access to sensitive information about a company or an individual. This research compares the import of features such as lexical features, Domain Named Based features, HTML Features, and tokenization of URLs in detecting phishing URLs. Experimental procedures were designed to compare the efficiency of the four categories of features used separately on three machine learning models (K-Nearest Neighbour, Decision Tree, Logistic Regression) and five ensemble learning classifiers (Random Forest, Bagging, Stacking, Ada Boost, Gradient Boost). Results obtained show higher accuracy for experiments done using URL tokenization with stacking classifier with accuracy scores of 96% and 99.3% respectively for the two datasets used. Future study would be based on more dataset with larger sample size to provide a basis for generalisation.","PeriodicalId":323504,"journal":{"name":"FUOYE Journal of Engineering and Technology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparative Analysis of Ensemble Learning and Non-Ensemble Machine Learning Algorithms for Phishing URL Detection\",\"authors\":\"Chiamaka M. Igwilo, Victor Odumuyiwa\",\"doi\":\"10.46792/fuoyejet.v7i3.807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Phishing is a social engineering attack that has been perpetuated for long and is still a prominent attack with an attending high number of victims. Through phishing, attackers can gain easy access to sensitive information about a company or an individual. This research compares the import of features such as lexical features, Domain Named Based features, HTML Features, and tokenization of URLs in detecting phishing URLs. Experimental procedures were designed to compare the efficiency of the four categories of features used separately on three machine learning models (K-Nearest Neighbour, Decision Tree, Logistic Regression) and five ensemble learning classifiers (Random Forest, Bagging, Stacking, Ada Boost, Gradient Boost). Results obtained show higher accuracy for experiments done using URL tokenization with stacking classifier with accuracy scores of 96% and 99.3% respectively for the two datasets used. Future study would be based on more dataset with larger sample size to provide a basis for generalisation.\",\"PeriodicalId\":323504,\"journal\":{\"name\":\"FUOYE Journal of Engineering and Technology\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"FUOYE Journal of Engineering and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46792/fuoyejet.v7i3.807\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"FUOYE Journal of Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46792/fuoyejet.v7i3.807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Analysis of Ensemble Learning and Non-Ensemble Machine Learning Algorithms for Phishing URL Detection
Phishing is a social engineering attack that has been perpetuated for long and is still a prominent attack with an attending high number of victims. Through phishing, attackers can gain easy access to sensitive information about a company or an individual. This research compares the import of features such as lexical features, Domain Named Based features, HTML Features, and tokenization of URLs in detecting phishing URLs. Experimental procedures were designed to compare the efficiency of the four categories of features used separately on three machine learning models (K-Nearest Neighbour, Decision Tree, Logistic Regression) and five ensemble learning classifiers (Random Forest, Bagging, Stacking, Ada Boost, Gradient Boost). Results obtained show higher accuracy for experiments done using URL tokenization with stacking classifier with accuracy scores of 96% and 99.3% respectively for the two datasets used. Future study would be based on more dataset with larger sample size to provide a basis for generalisation.