基于学习的网络异常检测数据增强

2020 29th International Conference on Computer Communications and Networks (ICCCN) Pub Date : 2020-08-01 DOI:10.1109/ICCCN49398.2020.9209598

Mohammad Al Olaimat, Dongeun Lee, Youngsoo Kim, Jong-Hoi Kim, Jinoh Kim

{"title":"基于学习的网络异常检测数据增强","authors":"Mohammad Al Olaimat, Dongeun Lee, Youngsoo Kim, Jong-Hoi Kim, Jinoh Kim","doi":"10.1109/ICCCN49398.2020.9209598","DOIUrl":null,"url":null,"abstract":"While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel \"Divide-Augment-Combine\" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.","PeriodicalId":137835,"journal":{"name":"2020 29th International Conference on Computer Communications and Networks (ICCCN)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Learning-based Data Augmentation for Network Anomaly Detection\",\"authors\":\"Mohammad Al Olaimat, Dongeun Lee, Youngsoo Kim, Jong-Hoi Kim, Jinoh Kim\",\"doi\":\"10.1109/ICCCN49398.2020.9209598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel \\\"Divide-Augment-Combine\\\" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.\",\"PeriodicalId\":137835,\"journal\":{\"name\":\"2020 29th International Conference on Computer Communications and Networks (ICCCN)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 29th International Conference on Computer Communications and Networks (ICCCN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCN49398.2020.9209598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 29th International Conference on Computer Communications and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN49398.2020.9209598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

虽然机器学习技术在过去几年中已经取得了显著的进步，但基于学习的方法取得成功的基本要求之一是可以获得高质量的数据，这些数据可以完全代表问题空间中的单个类。不幸的是，在许多数据集中，观察到少数类只有少数实例的严重程度的类不平衡并不罕见，包括网络流量跟踪高度偏向大量正常连接，而攻击实例的数量非常少。解决类不平衡问题的一个众所周知的方法是生成属于少数类的合成实例的数据增强。然而，传统的统计技术可能会受到限制，因为通过统计抽样扩展的数据应该具有与原始数据实例相同的密度，并且变化程度较小。本文采用基于学习的数据增强方法来实现有效的网络异常检测。基于学习的方法面临的关键挑战之一是模态崩溃问题，导致样本的多样性有限，这也从我们的初步实验结果中观察到。为此，我们提出了一种新的“除-增-合”(DAC)策略，该策略根据实例的特征对其进行分组，并使用生成对抗模型在组的基础上对数据进行扩充，以独立地表示子集。我们在最近收集的两个公共网络数据集(UNSW-NB15和IDS-2017)上进行的实验结果表明，所提出的技术在识别网络异常方面的性能提高了21.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Learning-based Data Augmentation for Network Anomaly Detection

While machine learning technologies have been remarkably advanced over the past several years, one of the fundamental requirements for the success of learning-based approaches would be the availability of high-quality data that thoroughly represent individual classes in a problem space. Unfortunately, it is not uncommon to observe a significant degree of class imbalance with only a few instances for minority classes in many datasets, including network traffic traces highly skewed toward a large number of normal connections while very small in quantity for attack instances. A well-known approach to addressing the class imbalance problem is data augmentation that generates synthetic instances belonging to minority classes. However, traditional statistical techniques may be limited since the extended data through statistical sampling should have the same density as original data instances with a minor degree of variation. This paper takes a learning-based approach to data augmentation to enable effective network anomaly detection. One of the critical challenges for the learning-based approach is the mode collapse problem resulting in a limited diversity of samples, which was also observed from our preliminary experimental result. To this end, we present a novel "Divide-Augment-Combine" (DAC) strategy, which groups the instances based on their characteristics and augments data on a group basis to represent a subset independently using a generative adversarial model. Our experimental results conducted with two recently collected public network datasets (UNSW-NB15 and IDS-2017) show that the proposed technique enhances performances up to 21.5% for identifying network anomalies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 29th International Conference on Computer Communications and Networks (ICCCN)

自引率

0.00%

发文量