An Unsupervised Feature Selection for Web Phishing Data using an Evolutionary Approach

2021 7th International Conference on Web Research (ICWR) Pub Date : 2021-05-19 DOI:10.1109/ICWR51868.2021.9443148

Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad

{"title":"An Unsupervised Feature Selection for Web Phishing Data using an Evolutionary Approach","authors":"Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad","doi":"10.1109/ICWR51868.2021.9443148","DOIUrl":null,"url":null,"abstract":"Phishing is one of the most serious cybercrimes used by fraudsters to steal individuals and organizations' identities and financial information. The most common form of phishing is phishing through fake websites. In recent years, phishing detection methods based on machine learning have gained attention due to their high accuracy. Feature selection is a preprocessing step in data mining and machine learning that is used to reduce the size of the feature space and find significant features while achieving comparable or higher accuracy. In this paper, an unsupervised feature selection method, called LAPPSO, is proposed for web phishing data. To find the most informative features, LAPPSO applies an improved version of PSO with a greater exploration for improving the global search and also uses the Laplacian score for local search. Based on experimental results obtained from applying LAPPSO on two well-known phishing datasets, our algorithm achieves the average F-measure of 96% while reducing the number of the features significantly. Moreover, the training time of the learning model is reduced to almost half using the selected features.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Phishing is one of the most serious cybercrimes used by fraudsters to steal individuals and organizations' identities and financial information. The most common form of phishing is phishing through fake websites. In recent years, phishing detection methods based on machine learning have gained attention due to their high accuracy. Feature selection is a preprocessing step in data mining and machine learning that is used to reduce the size of the feature space and find significant features while achieving comparable or higher accuracy. In this paper, an unsupervised feature selection method, called LAPPSO, is proposed for web phishing data. To find the most informative features, LAPPSO applies an improved version of PSO with a greater exploration for improving the global search and also uses the Laplacian score for local search. Based on experimental results obtained from applying LAPPSO on two well-known phishing datasets, our algorithm achieves the average F-measure of 96% while reducing the number of the features significantly. Moreover, the training time of the learning model is reduced to almost half using the selected features.

查看原文本刊更多论文

基于进化方法的网络钓鱼数据无监督特征选择

网络钓鱼是欺诈者用来窃取个人和组织身份和财务信息的最严重的网络犯罪之一。网络钓鱼最常见的形式是通过虚假网站进行网络钓鱼。近年来，基于机器学习的网络钓鱼检测方法因其准确率高而备受关注。特征选择是数据挖掘和机器学习中的预处理步骤，用于减少特征空间的大小并找到重要的特征，同时达到相当或更高的精度。本文提出了一种针对网络钓鱼数据的无监督特征选择方法LAPPSO。为了找到信息量最大的特征，LAPPSO应用了PSO的改进版本，对改进全局搜索进行了更大的探索，并使用拉普拉斯分数进行局部搜索。基于LAPPSO在两个知名网络钓鱼数据集上的实验结果，我们的算法在显著减少特征数量的同时，平均f值达到96%。此外，使用选择的特征，学习模型的训练时间几乎减少了一半。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 7th International Conference on Web Research (ICWR)

自引率

0.00%

发文量