Phishing Website Detection Using LGBM Classifier With URL-Based Lexical Features

Thahira A, Ansamma John
{"title":"Phishing Website Detection Using LGBM Classifier With URL-Based Lexical Features","authors":"Thahira A, Ansamma John","doi":"10.1109/SILCON55242.2022.10028793","DOIUrl":null,"url":null,"abstract":"People have become increasingly reliant on the internet for their everyday activities, and the majority of internet users are uninformed of the phishing attack, which is caused by the careless or unaware nature of humans, resulting in financial loss as well identity theft. Phishing detection is not a new research area; multiple efforts from various perspectives have been conducted in this domain and machine learning-based approaches have gained popularity among existing works. Machine learning-based model used a large number of features including external service-based features for phishing website detection, but these features are time-consuming and not suitable for real-time detection. This research proposes a fast and lightweight phishing website detection model based on URL-based lexical features, as well as a novel dataset. The experimental result shows that the parameter-tuned light gradient boosting machine (LGBM) classifier outperforms the random forest classifier, and the Pearson correlation algorithm outranks the other feature selection methods. The proposed approach out performs most of the existing method interms of accuracy with newly created dataset as well as the other two benchmark datasets.","PeriodicalId":183947,"journal":{"name":"2022 IEEE Silchar Subsection Conference (SILCON)","volume":"366 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Silchar Subsection Conference (SILCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SILCON55242.2022.10028793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

People have become increasingly reliant on the internet for their everyday activities, and the majority of internet users are uninformed of the phishing attack, which is caused by the careless or unaware nature of humans, resulting in financial loss as well identity theft. Phishing detection is not a new research area; multiple efforts from various perspectives have been conducted in this domain and machine learning-based approaches have gained popularity among existing works. Machine learning-based model used a large number of features including external service-based features for phishing website detection, but these features are time-consuming and not suitable for real-time detection. This research proposes a fast and lightweight phishing website detection model based on URL-based lexical features, as well as a novel dataset. The experimental result shows that the parameter-tuned light gradient boosting machine (LGBM) classifier outperforms the random forest classifier, and the Pearson correlation algorithm outranks the other feature selection methods. The proposed approach out performs most of the existing method interms of accuracy with newly created dataset as well as the other two benchmark datasets.
基于url词法特征的LGBM分类器网络钓鱼网站检测
人们越来越依赖互联网进行日常活动,大多数互联网用户都不知道网络钓鱼攻击,这是由于人类的粗心或不知情造成的,导致经济损失和身份盗窃。网络钓鱼检测并不是一个新的研究领域;在这个领域已经从不同的角度进行了多次努力,基于机器学习的方法已经在现有的作品中得到了普及。基于机器学习的模型使用了包括基于外部服务的特征在内的大量特征进行钓鱼网站检测,但这些特征耗时长,不适合实时检测。本研究提出了一种基于url词法特征的快速轻量级网络钓鱼网站检测模型,以及一个新的数据集。实验结果表明,参数调谐光梯度增强机(LGBM)分类器优于随机森林分类器,Pearson相关算法优于其他特征选择方法。该方法在新创建的数据集和另外两个基准数据集的准确率方面优于大多数现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信