{"title":"Decoupling non-stationary and stationary components in long range network time series in the context of anomaly detection","authors":"Cyriac James, H. Murthy","doi":"10.1109/LCN.2012.6423689","DOIUrl":null,"url":null,"abstract":"Network traffic characterisation and modeling using time series models is an area which has been extensively studied in the past. Coarse-grained (aggregated traffic) time series analysis using parametric approach, primarily carried out at the backbone network over a long time period (of the order of days to months), show strong deterministic cyclic trends, while the fine-grained (at the packet or flow level) counterpart, done mostly at edge network over small time period (of the order of few minutes), exhibit self-similar behaviour. This paper is an attempt to study the fine-grained time series characteristics of network traffic at an edge network, observed over a long period (of the order of days and weeks), using parametric approach. The analysis is carried out in the context of anomaly detection. Most of the earlier attempts in this direction followed a non-parametric approach, by either using adaptive or non-adaptive (i.e assuming stationarity) mechanisms, whose performance is found to be extremely sensitive towards empirically determined parameters of the model and hence difficult to determine. Also, the model parameters need to be recomputed at regular intervals of time (of the order of few seconds to minutes). To some extent, this make such algorithms less attractive in terms of generality and practical implementation. The first part of the paper discusses the statistical characteristics of such long range network time series. These are found to exhibit structural breaks apart from transient shocks and can be approximated by a stationary AR model, after an absolute first difference transformation (i.e decoupling stationary component from the non-stationary one). In the later part of the paper, the efficacy of the model proposed is evaluated, by conducting extensive trace driven simulations for the detection of low intensity TCP SYN flood Denial of Service (DoS) attacks. Performance is measured in terms of false positives, false alarm time, detection rate and detection delay. Experiments are performed on actual traffic traces collected from one of the edge networks over a period of three months and for various sampling intervals (10s, 60s, 120s). Comparative studies with adaptive and non-adaptive methods are carried out to demonstrate the relevance of the proposed model. It is observed that the proposed method gives better performance with 100% detection accuracy for false positive as low as 0.9%.","PeriodicalId":209071,"journal":{"name":"37th Annual IEEE Conference on Local Computer Networks","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"37th Annual IEEE Conference on Local Computer Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LCN.2012.6423689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Network traffic characterisation and modeling using time series models is an area which has been extensively studied in the past. Coarse-grained (aggregated traffic) time series analysis using parametric approach, primarily carried out at the backbone network over a long time period (of the order of days to months), show strong deterministic cyclic trends, while the fine-grained (at the packet or flow level) counterpart, done mostly at edge network over small time period (of the order of few minutes), exhibit self-similar behaviour. This paper is an attempt to study the fine-grained time series characteristics of network traffic at an edge network, observed over a long period (of the order of days and weeks), using parametric approach. The analysis is carried out in the context of anomaly detection. Most of the earlier attempts in this direction followed a non-parametric approach, by either using adaptive or non-adaptive (i.e assuming stationarity) mechanisms, whose performance is found to be extremely sensitive towards empirically determined parameters of the model and hence difficult to determine. Also, the model parameters need to be recomputed at regular intervals of time (of the order of few seconds to minutes). To some extent, this make such algorithms less attractive in terms of generality and practical implementation. The first part of the paper discusses the statistical characteristics of such long range network time series. These are found to exhibit structural breaks apart from transient shocks and can be approximated by a stationary AR model, after an absolute first difference transformation (i.e decoupling stationary component from the non-stationary one). In the later part of the paper, the efficacy of the model proposed is evaluated, by conducting extensive trace driven simulations for the detection of low intensity TCP SYN flood Denial of Service (DoS) attacks. Performance is measured in terms of false positives, false alarm time, detection rate and detection delay. Experiments are performed on actual traffic traces collected from one of the edge networks over a period of three months and for various sampling intervals (10s, 60s, 120s). Comparative studies with adaptive and non-adaptive methods are carried out to demonstrate the relevance of the proposed model. It is observed that the proposed method gives better performance with 100% detection accuracy for false positive as low as 0.9%.
利用时间序列模型对网络流量进行表征和建模是一个过去被广泛研究的领域。使用参数方法的粗粒度(聚合流量)时间序列分析,主要在骨干网络上长时间(几天到几个月)进行,显示出很强的确定性循环趋势,而细粒度(在包或流级别)对应,主要在边缘网络上在小时间(几分钟)内完成,表现出自相似的行为。本文试图使用参数方法研究边缘网络中网络流量的细粒度时间序列特征,该特征是在长时间(以天和周为顺序)观察到的。分析是在异常检测的背景下进行的。在这个方向上的大多数早期尝试遵循非参数方法,通过使用自适应或非自适应(即假设平稳)机制,其性能被发现对经验确定的模型参数极其敏感,因此难以确定。此外,模型参数需要以固定的时间间隔(几秒到几分钟)重新计算。在某种程度上,这使得这些算法在通用性和实际实现方面缺乏吸引力。本文第一部分讨论了这种远程网络时间序列的统计特征。这些被发现从瞬态冲击中表现出结构断裂,并且可以通过一个固定的AR模型来近似,经过绝对的第一次差分变换(即将固定分量从非固定分量解耦)。在本文的后半部分,通过对检测低强度TCP SYN flood拒绝服务(DoS)攻击进行广泛的跟踪驱动仿真,评估了所提出模型的有效性。性能以误报、误报时间、检测率和检测延迟来衡量。在三个月的时间里,对从一个边缘网络收集的实际流量痕迹进行了实验,并进行了不同的采样间隔(10秒、60秒、120秒)。采用自适应和非自适应方法进行了比较研究,以证明所提出模型的相关性。结果表明,该方法对低至0.9%的假阳性检测准确率达到100%,具有较好的检测效果。