Towards Generating Semi-Synthetic Datasets for Network Intrusion Detection System

Ngoc-Truong Nguyen, Ton-Nhan Le, Khanh-Hoi Le Minh, Kim-Hung Le
{"title":"Towards Generating Semi-Synthetic Datasets for Network Intrusion Detection System","authors":"Ngoc-Truong Nguyen, Ton-Nhan Le, Khanh-Hoi Le Minh, Kim-Hung Le","doi":"10.1109/ICOIN56518.2023.10048962","DOIUrl":null,"url":null,"abstract":"We have witnessed the proliferation of machine learning and its applications, especially in network-based intrusion detection systems (NIDS). With the ability to learn complex informative systems from data, machine learning models play a crucial role in identifying and preventing network attacks. However, training these models requires a massive volume of labeled data, which is nontrivial to obtain. Moreover, public datasets are often unbalanced, outdated, and different with network traffic from the networks that need to be protected. Therefore, in this paper, we introduce a framework, namely DGIDS, for generating semi-synthetic datasets for NIDS, which combines synthetic data and regular network traffic collected from the local network. Our proposed framework is capable of producing both benign and attack network data with characteristics similar to those in real scenarios. In practical experiments, we show that the network data generated by DGIDS significantly increase the detection quality of NIDS trained by public datasets from 54% to 90.39%.","PeriodicalId":285763,"journal":{"name":"2023 International Conference on Information Networking (ICOIN)","volume":"47 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Information Networking (ICOIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIN56518.2023.10048962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We have witnessed the proliferation of machine learning and its applications, especially in network-based intrusion detection systems (NIDS). With the ability to learn complex informative systems from data, machine learning models play a crucial role in identifying and preventing network attacks. However, training these models requires a massive volume of labeled data, which is nontrivial to obtain. Moreover, public datasets are often unbalanced, outdated, and different with network traffic from the networks that need to be protected. Therefore, in this paper, we introduce a framework, namely DGIDS, for generating semi-synthetic datasets for NIDS, which combines synthetic data and regular network traffic collected from the local network. Our proposed framework is capable of producing both benign and attack network data with characteristics similar to those in real scenarios. In practical experiments, we show that the network data generated by DGIDS significantly increase the detection quality of NIDS trained by public datasets from 54% to 90.39%.
网络入侵检测系统半合成数据集生成研究
我们目睹了机器学习及其应用的激增,特别是在基于网络的入侵检测系统(NIDS)中。凭借从数据中学习复杂信息系统的能力,机器学习模型在识别和预防网络攻击方面发挥着至关重要的作用。然而,训练这些模型需要大量的标记数据,这是很难获得的。此外,公共数据集通常是不平衡的、过时的,并且与需要保护的网络的网络流量不同。因此,在本文中,我们引入了一个框架,即DGIDS,用于生成用于NIDS的半合成数据集,该框架将合成数据与从本地网络收集的常规网络流量相结合。我们提出的框架能够产生具有与真实场景相似特征的良性和攻击网络数据。在实际实验中,我们发现DGIDS生成的网络数据显著提高了公共数据集训练的NIDS的检测质量,从54%提高到90.39%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信