An Unsupervised Feature Selection for Web Phishing Data using an Evolutionary Approach

Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad
{"title":"An Unsupervised Feature Selection for Web Phishing Data using an Evolutionary Approach","authors":"Motahare Akhavan, Seyed Mohammad Hossein Hasheminejad","doi":"10.1109/ICWR51868.2021.9443148","DOIUrl":null,"url":null,"abstract":"Phishing is one of the most serious cybercrimes used by fraudsters to steal individuals and organizations' identities and financial information. The most common form of phishing is phishing through fake websites. In recent years, phishing detection methods based on machine learning have gained attention due to their high accuracy. Feature selection is a preprocessing step in data mining and machine learning that is used to reduce the size of the feature space and find significant features while achieving comparable or higher accuracy. In this paper, an unsupervised feature selection method, called LAPPSO, is proposed for web phishing data. To find the most informative features, LAPPSO applies an improved version of PSO with a greater exploration for improving the global search and also uses the Laplacian score for local search. Based on experimental results obtained from applying LAPPSO on two well-known phishing datasets, our algorithm achieves the average F-measure of 96% while reducing the number of the features significantly. Moreover, the training time of the learning model is reduced to almost half using the selected features.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Phishing is one of the most serious cybercrimes used by fraudsters to steal individuals and organizations' identities and financial information. The most common form of phishing is phishing through fake websites. In recent years, phishing detection methods based on machine learning have gained attention due to their high accuracy. Feature selection is a preprocessing step in data mining and machine learning that is used to reduce the size of the feature space and find significant features while achieving comparable or higher accuracy. In this paper, an unsupervised feature selection method, called LAPPSO, is proposed for web phishing data. To find the most informative features, LAPPSO applies an improved version of PSO with a greater exploration for improving the global search and also uses the Laplacian score for local search. Based on experimental results obtained from applying LAPPSO on two well-known phishing datasets, our algorithm achieves the average F-measure of 96% while reducing the number of the features significantly. Moreover, the training time of the learning model is reduced to almost half using the selected features.
基于进化方法的网络钓鱼数据无监督特征选择
网络钓鱼是欺诈者用来窃取个人和组织身份和财务信息的最严重的网络犯罪之一。网络钓鱼最常见的形式是通过虚假网站进行网络钓鱼。近年来,基于机器学习的网络钓鱼检测方法因其准确率高而备受关注。特征选择是数据挖掘和机器学习中的预处理步骤,用于减少特征空间的大小并找到重要的特征,同时达到相当或更高的精度。本文提出了一种针对网络钓鱼数据的无监督特征选择方法LAPPSO。为了找到信息量最大的特征,LAPPSO应用了PSO的改进版本,对改进全局搜索进行了更大的探索,并使用拉普拉斯分数进行局部搜索。基于LAPPSO在两个知名网络钓鱼数据集上的实验结果,我们的算法在显著减少特征数量的同时,平均f值达到96%。此外,使用选择的特征,学习模型的训练时间几乎减少了一半。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信