{"title":"Effective ensemble learning phishing detection system using hybrid feature selection","authors":"Aaron Connolly , Hany F. Atlam","doi":"10.1016/j.jnca.2025.104251","DOIUrl":null,"url":null,"abstract":"<div><div>Phishing emails pose a significant threat to individuals and os, with traditional detection methods struggling to keep pace with the evolving sophistication of these attacks. Conventional Machine Learning (ML) approaches exhibit several limitations in achieving satisfactory accuracy levels when challenged with the evolving sophistication of phishing techniques. To effectively mitigate this challenge, the implementation of an advanced detection system incorporating innovative and sophisticated ML algorithms is crucial. Therefore, this paper proposed a novel stacking ensemble learning approach that leverages hybrid feature selection. The proposed model enhances the effectiveness of phishing detection by combining predictions from multiple ML algorithms, each utilising different subsets of features extracted from various parts of the email, including the header, body, and URLs. This comprehensive feature set ensures that the model captures a wide range of characteristics that differentiate phishing emails from legitimate ones. Extensive experiments were conducted to evaluate the effectiveness of the proposed model. The experimental results demonstrate that the proposed model achieves an impressive accuracy of 99.53% and an F1-measure of 0.9955, surpassing the highest accuracy of 99.10% obtained by any individual ML algorithm and outperforming the most effective phishing detection systems documented in recent literature. This significant improvement in accuracy highlights the efficacy of ensemble learning in this domain. Furthermore, the increase in accuracy is achieved with only a minimal 1.6 ms increase in detection time, making the model practical for real-world applications. This paper contributes significantly to the field of phishing detection by demonstrating the effectiveness of ensemble learning techniques in combination with hybrid feature selection. The proposed model offers a practical and effective solution to the problem of phishing, with the potential to significantly reduce the number of malicious emails reaching users’ inboxes.</div></div>","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"242 ","pages":"Article 104251"},"PeriodicalIF":7.7000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1084804525001481","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Phishing emails pose a significant threat to individuals and os, with traditional detection methods struggling to keep pace with the evolving sophistication of these attacks. Conventional Machine Learning (ML) approaches exhibit several limitations in achieving satisfactory accuracy levels when challenged with the evolving sophistication of phishing techniques. To effectively mitigate this challenge, the implementation of an advanced detection system incorporating innovative and sophisticated ML algorithms is crucial. Therefore, this paper proposed a novel stacking ensemble learning approach that leverages hybrid feature selection. The proposed model enhances the effectiveness of phishing detection by combining predictions from multiple ML algorithms, each utilising different subsets of features extracted from various parts of the email, including the header, body, and URLs. This comprehensive feature set ensures that the model captures a wide range of characteristics that differentiate phishing emails from legitimate ones. Extensive experiments were conducted to evaluate the effectiveness of the proposed model. The experimental results demonstrate that the proposed model achieves an impressive accuracy of 99.53% and an F1-measure of 0.9955, surpassing the highest accuracy of 99.10% obtained by any individual ML algorithm and outperforming the most effective phishing detection systems documented in recent literature. This significant improvement in accuracy highlights the efficacy of ensemble learning in this domain. Furthermore, the increase in accuracy is achieved with only a minimal 1.6 ms increase in detection time, making the model practical for real-world applications. This paper contributes significantly to the field of phishing detection by demonstrating the effectiveness of ensemble learning techniques in combination with hybrid feature selection. The proposed model offers a practical and effective solution to the problem of phishing, with the potential to significantly reduce the number of malicious emails reaching users’ inboxes.
期刊介绍:
The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.