利用机器学习检测不健康网站

COAST Journal of the School of Science Pub Date : 2024-06-03 DOI:10.61281/coastjss.v6i1.5

O. A. Gbadamosi, A. M. Oduwale

{"title":"利用机器学习检测不健康网站","authors":"O. A. Gbadamosi, A. M. Oduwale","doi":"10.61281/coastjss.v6i1.5","DOIUrl":null,"url":null,"abstract":"In recent years, advancements in Internet and cloud technologies have led to a signicant increase in electronic trading in which consumers make online purchases and transactions. Accompanying this achievement are vices like unauthorized access to users' sensitive information and damages to enterprise resources. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. This study aims to develop an efcient machine-learning program to detect phishing websites with high accuracy. Most phishing webpages look identical to the actual web pages and various strategies for detecting phishing websites, such as blacklisting, and heuristics, among others have been suggested. Existing research works showed that the performance of the phishing detection system is limited and there is a demand for intelligent techniques to protect users from cyber-attacks. A Uniform resource locator (URL) detection technique based on a supervised machine learning approach – Naïve Bayes is employed and implemented in Python programming language. The efcacy of this approach was determined on a phishing dataset made up of 7900 malicious and 5800 legitimate sites, respectively. The results show that using the proposed methodology an accuracy of 96% can be achieved by using stacking, ltering along the Naïve Bayes and logistic regression. This study thoroughly investigates the use of machine laearning with features extracted from the URLs and was able to showcase common words for the identication of either phishing (unhealthy) or good websites and proffered a guide to end users against the recent approaches in malicious URLs detection.","PeriodicalId":474287,"journal":{"name":"COAST Journal of the School of Science","volume":"177 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DETECTION OF UNHEALTHY WEBSITES USING MACHINE LEARNING\",\"authors\":\"O. A. Gbadamosi, A. M. Oduwale\",\"doi\":\"10.61281/coastjss.v6i1.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, advancements in Internet and cloud technologies have led to a signicant increase in electronic trading in which consumers make online purchases and transactions. Accompanying this achievement are vices like unauthorized access to users' sensitive information and damages to enterprise resources. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. This study aims to develop an efcient machine-learning program to detect phishing websites with high accuracy. Most phishing webpages look identical to the actual web pages and various strategies for detecting phishing websites, such as blacklisting, and heuristics, among others have been suggested. Existing research works showed that the performance of the phishing detection system is limited and there is a demand for intelligent techniques to protect users from cyber-attacks. A Uniform resource locator (URL) detection technique based on a supervised machine learning approach – Naïve Bayes is employed and implemented in Python programming language. The efcacy of this approach was determined on a phishing dataset made up of 7900 malicious and 5800 legitimate sites, respectively. The results show that using the proposed methodology an accuracy of 96% can be achieved by using stacking, ltering along the Naïve Bayes and logistic regression. This study thoroughly investigates the use of machine laearning with features extracted from the URLs and was able to showcase common words for the identication of either phishing (unhealthy) or good websites and proffered a guide to end users against the recent approaches in malicious URLs detection.\",\"PeriodicalId\":474287,\"journal\":{\"name\":\"COAST Journal of the School of Science\",\"volume\":\"177 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"COAST Journal of the School of Science\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.61281/coastjss.v6i1.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"COAST Journal of the School of Science","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.61281/coastjss.v6i1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，互联网和云技术的进步导致消费者在线购买和交易的电子交易显著增加。伴随着这一成就的是未经授权访问用户敏感信息和破坏企业资源等恶行。网络钓鱼是人们熟悉的攻击方式之一，它诱骗用户访问恶意内容并获取他们的信息。本研究旨在开发一种高效ient 机器学习程序，以高精度检测网络钓鱼网站。大多数网络钓鱼网页看起来与实际网页完全相同，人们提出了各种检测网络钓鱼网站的策略，如黑名单和启发式方法等。现有的研究工作表明，网络钓鱼检测系统的性能有限，因此需要智能技术来保护用户免受网络攻击。我们采用了一种基于监督式机器学习方法--奈夫贝叶斯（Naïve Bayes）的统一资源定位器（URL）检测技术，并在 Python 编程语言中加以实现。该方法在一个分别由 7900 个恶意网站和 5800 个合法网站组成的网络钓鱼数据集上进行了有效。结果表明，使用所提出的方法，通过堆叠、ltering 和 Naïve Bayes 以及逻辑回归，可以达到 96% 的准确率。本研究通过从 URL 提取的特征对机器学习的使用进行了深入研究，并展示了用于识别网络钓鱼（不健康）或良好网站的常用词c，同时针对最近的恶意 URL 检测方法为最终用户提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DETECTION OF UNHEALTHY WEBSITES USING MACHINE LEARNING

In recent years, advancements in Internet and cloud technologies have led to a signicant increase in electronic trading in which consumers make online purchases and transactions. Accompanying this achievement are vices like unauthorized access to users' sensitive information and damages to enterprise resources. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. This study aims to develop an efcient machine-learning program to detect phishing websites with high accuracy. Most phishing webpages look identical to the actual web pages and various strategies for detecting phishing websites, such as blacklisting, and heuristics, among others have been suggested. Existing research works showed that the performance of the phishing detection system is limited and there is a demand for intelligent techniques to protect users from cyber-attacks. A Uniform resource locator (URL) detection technique based on a supervised machine learning approach – Naïve Bayes is employed and implemented in Python programming language. The efcacy of this approach was determined on a phishing dataset made up of 7900 malicious and 5800 legitimate sites, respectively. The results show that using the proposed methodology an accuracy of 96% can be achieved by using stacking, ltering along the Naïve Bayes and logistic regression. This study thoroughly investigates the use of machine laearning with features extracted from the URLs and was able to showcase common words for the identication of either phishing (unhealthy) or good websites and proffered a guide to end users against the recent approaches in malicious URLs detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

COAST Journal of the School of Science

自引率

0.00%

发文量