具有越来越多测试的批处理顺序异常检测:在网络入侵检测中的应用

2012 IEEE International Workshop on Machine Learning for Signal Processing Pub Date : 2012-11-12 DOI:10.1109/MLSP.2012.6349793

David J. Miller, Fatih Kocak, G. Kesidis

{"title":"具有越来越多测试的批处理顺序异常检测:在网络入侵检测中的应用","authors":"David J. Miller, Fatih Kocak, G. Kesidis","doi":"10.1109/MLSP.2012.6349793","DOIUrl":null,"url":null,"abstract":"For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.","PeriodicalId":262601,"journal":{"name":"2012 IEEE International Workshop on Machine Learning for Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection\",\"authors\":\"David J. Miller, Fatih Kocak, G. Kesidis\",\"doi\":\"10.1109/MLSP.2012.6349793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.\",\"PeriodicalId\":262601,\"journal\":{\"name\":\"2012 IEEE International Workshop on Machine Learning for Signal Processing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Workshop on Machine Learning for Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLSP.2012.6349793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2012.6349793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

对于高(N)维特征空间，我们考虑在所有样本遵循相同概率律的零假设下，在一批收集的样本(大小为T)中检测未知的异常类样本。由于最能识别异常的特征是先验未知的，几种常见的检测策略是:1)基于在全n维特征空间上定义的零分布评估样本的非典型性(其p值);2)考虑一组(组合的)低阶分布，例如，所有的单例和所有的特征对，并根据在所有这些低阶测试中产生的最小p值进行检测。第一种方法依赖于对联合分布的准确估计，而第二种方法可能会随着N和T的增长而增加误报率。另外，受监督学习中常用的贪婪特征选择的启发，我们提出了一种新的序列异常检测过程，该过程具有越来越多的测试。在这里，新测试(贪婪地)只在需要时才被包括在内，即，当它们的使用(在当前未检测到的样本上)将产生比使用现有测试干部可获得的更大的(多次测试校正)检测的总体统计显著性时。因此，我们的方法旨在最大限度地提高所有检测的总体统计显著性，直到有限视界。我们的方法与监督方法一起被评估，用于网络入侵域，检测嵌入在(正常)Web流中的Zeus bot(入侵)数据包流。结果表明，正确的特征表示是区分Zeus和Web的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection

For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE International Workshop on Machine Learning for Signal Processing

自引率

0.00%

发文量