Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad
{"title":"An Unsupervised Feature Selection for Web Phishing Data using an Evolutionary Approach","authors":"Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad","doi":"10.1109/ICWR51868.2021.9443148","DOIUrl":null,"url":null,"abstract":"Phishing is one of the most serious cybercrimes used by fraudsters to steal individuals and organizations' identities and financial information. The most common form of phishing is phishing through fake websites. In recent years, phishing detection methods based on machine learning have gained attention due to their high accuracy. Feature selection is a preprocessing step in data mining and machine learning that is used to reduce the size of the feature space and find significant features while achieving comparable or higher accuracy. In this paper, an unsupervised feature selection method, called LAPPSO, is proposed for web phishing data. To find the most informative features, LAPPSO applies an improved version of PSO with a greater exploration for improving the global search and also uses the Laplacian score for local search. Based on experimental results obtained from applying LAPPSO on two well-known phishing datasets, our algorithm achieves the average F-measure of 96% while reducing the number of the features significantly. Moreover, the training time of the learning model is reduced to almost half using the selected features.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Phishing is one of the most serious cybercrimes used by fraudsters to steal individuals and organizations' identities and financial information. The most common form of phishing is phishing through fake websites. In recent years, phishing detection methods based on machine learning have gained attention due to their high accuracy. Feature selection is a preprocessing step in data mining and machine learning that is used to reduce the size of the feature space and find significant features while achieving comparable or higher accuracy. In this paper, an unsupervised feature selection method, called LAPPSO, is proposed for web phishing data. To find the most informative features, LAPPSO applies an improved version of PSO with a greater exploration for improving the global search and also uses the Laplacian score for local search. Based on experimental results obtained from applying LAPPSO on two well-known phishing datasets, our algorithm achieves the average F-measure of 96% while reducing the number of the features significantly. Moreover, the training time of the learning model is reduced to almost half using the selected features.