Using Multiclass Machine Learning Methods to Classify Malicious Behaviors Aimed at Web Systems

2012 IEEE 23rd International Symposium on Software Reliability Engineering Pub Date : 2012-11-27 DOI:10.1109/ISSRE.2012.30

K. Goseva-Popstojanova, Goce Anastasovski, Risto Pantev

{"title":"Using Multiclass Machine Learning Methods to Classify Malicious Behaviors Aimed at Web Systems","authors":"K. Goseva-Popstojanova, Goce Anastasovski, Risto Pantev","doi":"10.1109/ISSRE.2012.30","DOIUrl":null,"url":null,"abstract":"The number of vulnerabilities and attacks on Web systems show an increasing trend and tend to dominate on the Internet. Furthermore, due to their popularity and users ability to create content, Web 2.0 applications have become particularly attractive targets. These trends clearly illustrate the need for better understanding of malicious cyber activities based on both qualitative and quantitative analysis. This paper is focused on multiclass classification of malicious Web activities using three supervised machine learning methods: J48, PART, and Support Vector Machines (SVM). The empirical analysis is based on data collected in duration of nine months by a high interaction honey pot consisting of a three-tier Web system, which included Web 2.0 applications (i.e., a blog and wiki). Our results show that supervised learning methods can be used to efficiently distinguish among multiple vulnerability scan and attack classes, with high recall and precision values for all but several very small classes. For our dataset, decision tree based methods J48 and PART perform slightly better than SVM in terms of overall accuracy and weighted recall. Additionally, J48 and PART require less than half of the features (i.e., session attributes) used by SVM, as well as they execute much faster. Therefore, they seem to be clear methods of choice.","PeriodicalId":172003,"journal":{"name":"2012 IEEE 23rd International Symposium on Software Reliability Engineering","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 23rd International Symposium on Software Reliability Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE.2012.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

The number of vulnerabilities and attacks on Web systems show an increasing trend and tend to dominate on the Internet. Furthermore, due to their popularity and users ability to create content, Web 2.0 applications have become particularly attractive targets. These trends clearly illustrate the need for better understanding of malicious cyber activities based on both qualitative and quantitative analysis. This paper is focused on multiclass classification of malicious Web activities using three supervised machine learning methods: J48, PART, and Support Vector Machines (SVM). The empirical analysis is based on data collected in duration of nine months by a high interaction honey pot consisting of a three-tier Web system, which included Web 2.0 applications (i.e., a blog and wiki). Our results show that supervised learning methods can be used to efficiently distinguish among multiple vulnerability scan and attack classes, with high recall and precision values for all but several very small classes. For our dataset, decision tree based methods J48 and PART perform slightly better than SVM in terms of overall accuracy and weighted recall. Additionally, J48 and PART require less than half of the features (i.e., session attributes) used by SVM, as well as they execute much faster. Therefore, they seem to be clear methods of choice.

查看原文本刊更多论文

使用多类机器学习方法对针对Web系统的恶意行为进行分类

Web系统的漏洞和攻击数量呈上升趋势，并在Internet上占据主导地位。此外，由于Web 2.0应用程序的流行和用户创建内容的能力，它已成为特别有吸引力的目标。这些趋势清楚地表明，需要在定性和定量分析的基础上更好地理解恶意网络活动。本文的重点是使用三种监督机器学习方法:J48, PART和支持向量机(SVM)对恶意Web活动进行多类分类。实证分析基于一个由三层Web系统(包括Web 2.0应用程序(即博客和wiki))组成的高交互性蜜罐在9个月内收集的数据。我们的研究结果表明，监督学习方法可以有效地区分多个漏洞扫描和攻击类别，除了几个非常小的类别外，所有类别都具有很高的召回率和精度值。对于我们的数据集，基于决策树的方法J48和PART在总体准确率和加权召回率方面的表现略好于SVM。此外，J48和PART只需要SVM使用的不到一半的特征(即会话属性)，而且执行速度快得多。因此，它们似乎是明确的选择方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 23rd International Symposium on Software Reliability Engineering

自引率

0.00%

发文量