PU Learning in Payload-based Web Anomaly Detection

2018 Third International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC) Pub Date : 2018-10-01 DOI:10.1109/SSIC.2018.8556662

Yuxuan Luo, Shaoyin Cheng, Chong Liu, Fan Jiang

{"title":"PU Learning in Payload-based Web Anomaly Detection","authors":"Yuxuan Luo, Shaoyin Cheng, Chong Liu, Fan Jiang","doi":"10.1109/SSIC.2018.8556662","DOIUrl":null,"url":null,"abstract":"Intrusion detection is one of the most important methods for protecting web-based applications. Most anomaly detection approaches have weak detection capabilities for a new type of malicious web traffic. Besides, the misuse detection methods are based on malicious pattern matching, where the patterns usually depended on security experts. Although some supervised techniques have been applied, in real scenarios, HTTP traffic dataset is impure and more diverse. In this paper, we propose a new web anomaly detection method that combines with supervised learning model and PU learning (Positive and Unlabeled learning) based on HTTP payload data. In order to represent as many data patterns as possible, we vectorize HTTP request payloads by its numeric ASCII or Unicode value on byte-level, and each HTTP payload will be represented as a dimension-fixed numerical vector. First, our approach trains a base supervised XG-Boost model to learn the most of known attack traffics, and then the remaining normal traffics will be passed to a classifier based on the PU learning algorithm for finding some unknown malicious traffics. We test our model on a dataset gathered from a well-known security enterprise and the results show that our model achieves a remarkable accuracies on known attacks detection and has a great improvement in detecting unknown malicious web traffics.","PeriodicalId":302563,"journal":{"name":"2018 Third International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Third International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSIC.2018.8556662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Intrusion detection is one of the most important methods for protecting web-based applications. Most anomaly detection approaches have weak detection capabilities for a new type of malicious web traffic. Besides, the misuse detection methods are based on malicious pattern matching, where the patterns usually depended on security experts. Although some supervised techniques have been applied, in real scenarios, HTTP traffic dataset is impure and more diverse. In this paper, we propose a new web anomaly detection method that combines with supervised learning model and PU learning (Positive and Unlabeled learning) based on HTTP payload data. In order to represent as many data patterns as possible, we vectorize HTTP request payloads by its numeric ASCII or Unicode value on byte-level, and each HTTP payload will be represented as a dimension-fixed numerical vector. First, our approach trains a base supervised XG-Boost model to learn the most of known attack traffics, and then the remaining normal traffics will be passed to a classifier based on the PU learning algorithm for finding some unknown malicious traffics. We test our model on a dataset gathered from a well-known security enterprise and the results show that our model achieves a remarkable accuracies on known attacks detection and has a great improvement in detecting unknown malicious web traffics.

查看原文本刊更多论文

基于有效负载的Web异常检测中的PU学习

入侵检测是保护基于web的应用程序的最重要方法之一。大多数异常检测方法对新型恶意web流量的检测能力较弱。此外，误用检测方法基于恶意模式匹配，其中模式通常依赖于安全专家。虽然已经应用了一些监督技术，但在实际场景中，HTTP流量数据集是不纯的，而且更加多样化。本文提出了一种基于HTTP有效负载数据，结合监督学习模型和PU学习(Positive and Unlabeled learning)的web异常检测方法。为了表示尽可能多的数据模式，我们在字节级别上通过其数字ASCII或Unicode值对HTTP请求有效负载进行矢量化，并且每个HTTP有效负载将表示为维度固定的数字向量。首先，我们的方法训练一个基础监督的XG-Boost模型来学习大多数已知的攻击流量，然后将剩余的正常流量传递给基于PU学习算法的分类器来发现一些未知的恶意流量。我们在一家知名安全企业的数据集上对模型进行了测试，结果表明我们的模型在已知攻击检测上取得了显著的准确性，在检测未知恶意web流量方面有很大的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Third International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC)

自引率

0.00%

发文量