基于机器学习和概念漂移的恶意网站检测方法

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS) Pub Date : 2020-01-01 DOI:10.1109/COMSNETS48256.2020.9027485

Siddharth Singhal, Utkarsh Chawla, R. Shorey

{"title":"基于机器学习和概念漂移的恶意网站检测方法","authors":"Siddharth Singhal, Utkarsh Chawla, R. Shorey","doi":"10.1109/COMSNETS48256.2020.9027485","DOIUrl":null,"url":null,"abstract":"The rampant increase in the number of available cyber attack vectors and the frequency of cyber attacks necessitates the implementation of robust cybersecurity systems. Malicious websites are a significant threat to cybersecurity. Miscreants and hackers use malicious websites for illegal activities such as disrupting the functioning of the systems by implanting malware, gaining unauthorized access to systems, or illegally collecting personal information. We propose and implement an approach for classifying malicious and benign websites given their Uniform Resource Locator (URL) as input. Using the URL provided by the user, we collect Lexical, Host-Based, and Content-Based features for the website. These features are fed into a supervised Machine Learning algorithm as input that classifies the URL as malicious or benign. The models are trained on a dataset consisting of multiple malicious and benign URLs. We have evaluated the accuracy of classification for Random forests, Gradient Boosted Decision Trees and Deep Neural Network classifiers. One loophole in the use of Machine learning for detection is the availability of the same training data to the attackers. This data is exploited by the miscreants to alter the features associated with the Malicious URLs, which will be classified as benign by the supervised learning algorithms. Further, owing to the dynamic nature of the malicious websites, we also propose a paradigm for detecting and countering these manually induced concept drifts.","PeriodicalId":265871,"journal":{"name":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Machine Learning & Concept Drift based Approach for Malicious Website Detection\",\"authors\":\"Siddharth Singhal, Utkarsh Chawla, R. Shorey\",\"doi\":\"10.1109/COMSNETS48256.2020.9027485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rampant increase in the number of available cyber attack vectors and the frequency of cyber attacks necessitates the implementation of robust cybersecurity systems. Malicious websites are a significant threat to cybersecurity. Miscreants and hackers use malicious websites for illegal activities such as disrupting the functioning of the systems by implanting malware, gaining unauthorized access to systems, or illegally collecting personal information. We propose and implement an approach for classifying malicious and benign websites given their Uniform Resource Locator (URL) as input. Using the URL provided by the user, we collect Lexical, Host-Based, and Content-Based features for the website. These features are fed into a supervised Machine Learning algorithm as input that classifies the URL as malicious or benign. The models are trained on a dataset consisting of multiple malicious and benign URLs. We have evaluated the accuracy of classification for Random forests, Gradient Boosted Decision Trees and Deep Neural Network classifiers. One loophole in the use of Machine learning for detection is the availability of the same training data to the attackers. This data is exploited by the miscreants to alter the features associated with the Malicious URLs, which will be classified as benign by the supervised learning algorithms. Further, owing to the dynamic nature of the malicious websites, we also propose a paradigm for detecting and countering these manually induced concept drifts.\",\"PeriodicalId\":265871,\"journal\":{\"name\":\"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSNETS48256.2020.9027485\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS48256.2020.9027485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

可用的网络攻击媒介数量和网络攻击频率的急剧增加要求实施强大的网络安全系统。恶意网站是网络安全的重大威胁。不法分子和黑客利用恶意网站进行非法活动，例如通过植入恶意软件破坏系统功能，获得对系统的未经授权访问，或非法收集个人信息。我们提出并实现了一种基于统一资源定位符(URL)作为输入的恶意和良性网站分类方法。使用用户提供的URL，我们为网站收集词法、基于主机和基于内容的功能。这些特征作为输入输入被输入到有监督的机器学习算法中，该算法将URL分类为恶意或良性。这些模型是在由多个恶意和良性url组成的数据集上训练的。我们已经评估了随机森林、梯度增强决策树和深度神经网络分类器的分类精度。使用机器学习进行检测的一个漏洞是攻击者可以获得相同的训练数据。不法分子利用这些数据来改变与恶意url相关的特征，这些特征将被监督学习算法分类为良性。此外，由于恶意网站的动态性，我们还提出了一种检测和对抗这些人工诱导的概念漂移的范例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning & Concept Drift based Approach for Malicious Website Detection

The rampant increase in the number of available cyber attack vectors and the frequency of cyber attacks necessitates the implementation of robust cybersecurity systems. Malicious websites are a significant threat to cybersecurity. Miscreants and hackers use malicious websites for illegal activities such as disrupting the functioning of the systems by implanting malware, gaining unauthorized access to systems, or illegally collecting personal information. We propose and implement an approach for classifying malicious and benign websites given their Uniform Resource Locator (URL) as input. Using the URL provided by the user, we collect Lexical, Host-Based, and Content-Based features for the website. These features are fed into a supervised Machine Learning algorithm as input that classifies the URL as malicious or benign. The models are trained on a dataset consisting of multiple malicious and benign URLs. We have evaluated the accuracy of classification for Random forests, Gradient Boosted Decision Trees and Deep Neural Network classifiers. One loophole in the use of Machine learning for detection is the availability of the same training data to the attackers. This data is exploited by the miscreants to alter the features associated with the Malicious URLs, which will be classified as benign by the supervised learning algorithms. Further, owing to the dynamic nature of the malicious websites, we also propose a paradigm for detecting and countering these manually induced concept drifts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)

自引率

0.00%

发文量