使用决策树检测钓鱼网站：机器学习方法

International Journal for Electronic Crime Investigation Pub Date : 2023-07-03 DOI:10.54692/ijeci.2023.0702155

Ashar Ahmed Fazal, Maryam Daud

{"title":"使用决策树检测钓鱼网站：机器学习方法","authors":"Ashar Ahmed Fazal, Maryam Daud","doi":"10.54692/ijeci.2023.0702155","DOIUrl":null,"url":null,"abstract":"This study emphasises the value of feature selection and preprocessing in improving model performance and demonstrates the efficiency of decision trees in identifying phishing websites. Internet users are significantly threatened by phishing websites, hence a strong detection strategy is required. The Phishing Websites Dataset from the UCI Machine Learning Repository, which contains 30 website-related features, is used in the study together with a decision tree classifier from the scikit-learn package. The dataset is preprocessed to remove invalid and missing values, and the most pertinent features are chosen for model training. 80% of the dataset is utilised to train the model, while the remaining 20% is used for testing. The findings demonstrate the decision tree classifier's precision in detecting phishing websites, scoring 95.97% accurate and showing a high true positive rate (96.64%) and a negligible (3.04%) false positive rate using the confusion matrix. This study highlights the significance of feature selection and preprocessing for optimal model performance in addition to validating the efficacy of decision trees in phishing detection. The method described here can be helpful for businesses and individuals looking to protect themselves from phishing assaults, and the given data visualisations make it easier to understand datasets and assess models.","PeriodicalId":156403,"journal":{"name":"International Journal for Electronic Crime Investigation","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detecting Phishing Websites using Decision Trees: A Machine Learning Approach\",\"authors\":\"Ashar Ahmed Fazal, Maryam Daud\",\"doi\":\"10.54692/ijeci.2023.0702155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study emphasises the value of feature selection and preprocessing in improving model performance and demonstrates the efficiency of decision trees in identifying phishing websites. Internet users are significantly threatened by phishing websites, hence a strong detection strategy is required. The Phishing Websites Dataset from the UCI Machine Learning Repository, which contains 30 website-related features, is used in the study together with a decision tree classifier from the scikit-learn package. The dataset is preprocessed to remove invalid and missing values, and the most pertinent features are chosen for model training. 80% of the dataset is utilised to train the model, while the remaining 20% is used for testing. The findings demonstrate the decision tree classifier's precision in detecting phishing websites, scoring 95.97% accurate and showing a high true positive rate (96.64%) and a negligible (3.04%) false positive rate using the confusion matrix. This study highlights the significance of feature selection and preprocessing for optimal model performance in addition to validating the efficacy of decision trees in phishing detection. The method described here can be helpful for businesses and individuals looking to protect themselves from phishing assaults, and the given data visualisations make it easier to understand datasets and assess models.\",\"PeriodicalId\":156403,\"journal\":{\"name\":\"International Journal for Electronic Crime Investigation\",\"volume\":\"42 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal for Electronic Crime Investigation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54692/ijeci.2023.0702155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal for Electronic Crime Investigation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54692/ijeci.2023.0702155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

这项研究强调了特征选择和预处理在提高模型性能方面的价值，并证明了决策树在识别钓鱼网站方面的效率。互联网用户受到钓鱼网站的严重威胁，因此需要强有力的检测策略。研究中使用了 UCI 机器学习库中的钓鱼网站数据集和 scikit-learn 软件包中的决策树分类器，前者包含 30 个网站相关特征。数据集经过预处理以去除无效值和缺失值，并选择最相关的特征进行模型训练。数据集的 80% 用于训练模型，其余 20% 用于测试。研究结果表明，决策树分类器在检测钓鱼网站方面非常精确，准确率高达 95.97%，真阳性率高达 96.64%，而使用混淆矩阵的假阳性率仅为 3.04%，可以忽略不计。这项研究强调了特征选择和预处理对于优化模型性能的重要性，同时也验证了决策树在网络钓鱼检测中的功效。本文介绍的方法对希望保护自己免受网络钓鱼攻击的企业和个人很有帮助，所提供的数据可视化也使人们更容易理解数据集和评估模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Phishing Websites using Decision Trees: A Machine Learning Approach

This study emphasises the value of feature selection and preprocessing in improving model performance and demonstrates the efficiency of decision trees in identifying phishing websites. Internet users are significantly threatened by phishing websites, hence a strong detection strategy is required. The Phishing Websites Dataset from the UCI Machine Learning Repository, which contains 30 website-related features, is used in the study together with a decision tree classifier from the scikit-learn package. The dataset is preprocessed to remove invalid and missing values, and the most pertinent features are chosen for model training. 80% of the dataset is utilised to train the model, while the remaining 20% is used for testing. The findings demonstrate the decision tree classifier's precision in detecting phishing websites, scoring 95.97% accurate and showing a high true positive rate (96.64%) and a negligible (3.04%) false positive rate using the confusion matrix. This study highlights the significance of feature selection and preprocessing for optimal model performance in addition to validating the efficacy of decision trees in phishing detection. The method described here can be helpful for businesses and individuals looking to protect themselves from phishing assaults, and the given data visualisations make it easier to understand datasets and assess models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal for Electronic Crime Investigation

自引率

0.00%

发文量