Ngoc-Truong Nguyen, Ton-Nhan Le, Khanh-Hoi Le Minh, Kim-Hung Le
{"title":"Towards Generating Semi-Synthetic Datasets for Network Intrusion Detection System","authors":"Ngoc-Truong Nguyen, Ton-Nhan Le, Khanh-Hoi Le Minh, Kim-Hung Le","doi":"10.1109/ICOIN56518.2023.10048962","DOIUrl":null,"url":null,"abstract":"We have witnessed the proliferation of machine learning and its applications, especially in network-based intrusion detection systems (NIDS). With the ability to learn complex informative systems from data, machine learning models play a crucial role in identifying and preventing network attacks. However, training these models requires a massive volume of labeled data, which is nontrivial to obtain. Moreover, public datasets are often unbalanced, outdated, and different with network traffic from the networks that need to be protected. Therefore, in this paper, we introduce a framework, namely DGIDS, for generating semi-synthetic datasets for NIDS, which combines synthetic data and regular network traffic collected from the local network. Our proposed framework is capable of producing both benign and attack network data with characteristics similar to those in real scenarios. In practical experiments, we show that the network data generated by DGIDS significantly increase the detection quality of NIDS trained by public datasets from 54% to 90.39%.","PeriodicalId":285763,"journal":{"name":"2023 International Conference on Information Networking (ICOIN)","volume":"47 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Information Networking (ICOIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIN56518.2023.10048962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We have witnessed the proliferation of machine learning and its applications, especially in network-based intrusion detection systems (NIDS). With the ability to learn complex informative systems from data, machine learning models play a crucial role in identifying and preventing network attacks. However, training these models requires a massive volume of labeled data, which is nontrivial to obtain. Moreover, public datasets are often unbalanced, outdated, and different with network traffic from the networks that need to be protected. Therefore, in this paper, we introduce a framework, namely DGIDS, for generating semi-synthetic datasets for NIDS, which combines synthetic data and regular network traffic collected from the local network. Our proposed framework is capable of producing both benign and attack network data with characteristics similar to those in real scenarios. In practical experiments, we show that the network data generated by DGIDS significantly increase the detection quality of NIDS trained by public datasets from 54% to 90.39%.