{"title":"Phishing Website Detection Using LGBM Classifier With URL-Based Lexical Features","authors":"Thahira A, Ansamma John","doi":"10.1109/SILCON55242.2022.10028793","DOIUrl":null,"url":null,"abstract":"People have become increasingly reliant on the internet for their everyday activities, and the majority of internet users are uninformed of the phishing attack, which is caused by the careless or unaware nature of humans, resulting in financial loss as well identity theft. Phishing detection is not a new research area; multiple efforts from various perspectives have been conducted in this domain and machine learning-based approaches have gained popularity among existing works. Machine learning-based model used a large number of features including external service-based features for phishing website detection, but these features are time-consuming and not suitable for real-time detection. This research proposes a fast and lightweight phishing website detection model based on URL-based lexical features, as well as a novel dataset. The experimental result shows that the parameter-tuned light gradient boosting machine (LGBM) classifier outperforms the random forest classifier, and the Pearson correlation algorithm outranks the other feature selection methods. The proposed approach out performs most of the existing method interms of accuracy with newly created dataset as well as the other two benchmark datasets.","PeriodicalId":183947,"journal":{"name":"2022 IEEE Silchar Subsection Conference (SILCON)","volume":"366 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Silchar Subsection Conference (SILCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SILCON55242.2022.10028793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
People have become increasingly reliant on the internet for their everyday activities, and the majority of internet users are uninformed of the phishing attack, which is caused by the careless or unaware nature of humans, resulting in financial loss as well identity theft. Phishing detection is not a new research area; multiple efforts from various perspectives have been conducted in this domain and machine learning-based approaches have gained popularity among existing works. Machine learning-based model used a large number of features including external service-based features for phishing website detection, but these features are time-consuming and not suitable for real-time detection. This research proposes a fast and lightweight phishing website detection model based on URL-based lexical features, as well as a novel dataset. The experimental result shows that the parameter-tuned light gradient boosting machine (LGBM) classifier outperforms the random forest classifier, and the Pearson correlation algorithm outranks the other feature selection methods. The proposed approach out performs most of the existing method interms of accuracy with newly created dataset as well as the other two benchmark datasets.