Mustafa Al-Fayoumi, Bushra Alhijawi, Q. Abu Al-haija, Rakan Armoush
{"title":"XAI-PhD: Fortifying Trust of Phishing URL Detection Empowered by Shapley Additive Explanations","authors":"Mustafa Al-Fayoumi, Bushra Alhijawi, Q. Abu Al-haija, Rakan Armoush","doi":"10.3991/ijoe.v20i11.49533","DOIUrl":null,"url":null,"abstract":"The rapid growth of the Internet has led to an increased demand for online services. However, this surge in online activity has also brought about a new threat: phishing attacks. Phishing is a type of cyberattack that utilizes social engineering techniques and technological manipulations to steal crucial information from unsuspecting individuals. Consequently, there is a rising necessity to create dependable phishing URL detection models that can effectively identify phishing URLs with enhanced accuracy and reduced prediction overhead. This study introduces XAI-PhD, an innovative phishing detection method that utilizes machine learning (ML) and Shapley additive explanation (SHAP) capabilities. Specifically, XAI-PhD utilizes SHAP to thoroughly analyze the significance of each feature in influencing the decision-making process of the classifier. By selectively incorporating input characteristics based on their SHAP values, only the most crucial attributes are assessed, enabling the development of a highly adaptable and generalized model. XAI-PhD utilizes a lightweight gradient boosting machine as its classifier, and a series of rigorous tests are conducted to assess its performance compared to established baseline methods. The empirical findings unequivocally demonstrate the exceptional effectiveness of XAI-PhD, as evidenced by its remarkable accuracy and F1-score of 99.8% and 99%, respectively. Moreover, XAI-PhD exhibits high computational efficiency, requiring only 1.47 milliseconds and 18.5 microseconds per record to generate accurate predictions.","PeriodicalId":507997,"journal":{"name":"International Journal of Online and Biomedical Engineering (iJOE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Online and Biomedical Engineering (iJOE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3991/ijoe.v20i11.49533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid growth of the Internet has led to an increased demand for online services. However, this surge in online activity has also brought about a new threat: phishing attacks. Phishing is a type of cyberattack that utilizes social engineering techniques and technological manipulations to steal crucial information from unsuspecting individuals. Consequently, there is a rising necessity to create dependable phishing URL detection models that can effectively identify phishing URLs with enhanced accuracy and reduced prediction overhead. This study introduces XAI-PhD, an innovative phishing detection method that utilizes machine learning (ML) and Shapley additive explanation (SHAP) capabilities. Specifically, XAI-PhD utilizes SHAP to thoroughly analyze the significance of each feature in influencing the decision-making process of the classifier. By selectively incorporating input characteristics based on their SHAP values, only the most crucial attributes are assessed, enabling the development of a highly adaptable and generalized model. XAI-PhD utilizes a lightweight gradient boosting machine as its classifier, and a series of rigorous tests are conducted to assess its performance compared to established baseline methods. The empirical findings unequivocally demonstrate the exceptional effectiveness of XAI-PhD, as evidenced by its remarkable accuracy and F1-score of 99.8% and 99%, respectively. Moreover, XAI-PhD exhibits high computational efficiency, requiring only 1.47 milliseconds and 18.5 microseconds per record to generate accurate predictions.