A Multi-Agent Intrusion Detection System Optimized by a Deep Reinforcement Learning Approach with a Dataset Enlarged Using a Generative Model to Reduce the Bias Effect

IF 4.2 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Sensor and Actuator Networks Pub Date : 2023-09-18 DOI:10.3390/jsan12050068

Matthieu Mouyart, Guilherme Medeiros Machado, Jae-Yun Jun

{"title":"A Multi-Agent Intrusion Detection System Optimized by a Deep Reinforcement Learning Approach with a Dataset Enlarged Using a Generative Model to Reduce the Bias Effect","authors":"Matthieu Mouyart, Guilherme Medeiros Machado, Jae-Yun Jun","doi":"10.3390/jsan12050068","DOIUrl":null,"url":null,"abstract":"Intrusion detection systems can defectively perform when they are adjusted with datasets that are unbalanced in terms of attack data and non-attack data. Most datasets contain more non-attack data than attack data, and this circumstance can introduce biases in intrusion detection systems, making them vulnerable to cyberattacks. As an approach to remedy this issue, we considered the Conditional Tabular Generative Adversarial Network (CTGAN), with its hyperparameters optimized using the tree-structured Parzen estimator (TPE), to balance an insider threat tabular dataset called the CMU-CERT, which is formed by discrete-value and continuous-value columns. We showed through this method that the mean absolute errors between the probability mass functions (PMFs) of the actual data and the PMFs of the data generated using the CTGAN can be relatively small. Then, from the optimized CTGAN, we generated synthetic insider threat data and combined them with the actual ones to balance the original dataset. We used the resulting dataset for an intrusion detection system implemented with the Adversarial Environment Reinforcement Learning (AE-RL) algorithm in a multi-agent framework formed by an attacker and a defender. We showed that the performance of detecting intrusions using the framework of the CTGAN and the AE-RL is significantly improved with respect to the case where the dataset is not balanced, giving an F1-score of 0.7617.","PeriodicalId":37584,"journal":{"name":"Journal of Sensor and Actuator Networks","volume":"31 1","pages":"0"},"PeriodicalIF":4.2000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sensor and Actuator Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jsan12050068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Intrusion detection systems can defectively perform when they are adjusted with datasets that are unbalanced in terms of attack data and non-attack data. Most datasets contain more non-attack data than attack data, and this circumstance can introduce biases in intrusion detection systems, making them vulnerable to cyberattacks. As an approach to remedy this issue, we considered the Conditional Tabular Generative Adversarial Network (CTGAN), with its hyperparameters optimized using the tree-structured Parzen estimator (TPE), to balance an insider threat tabular dataset called the CMU-CERT, which is formed by discrete-value and continuous-value columns. We showed through this method that the mean absolute errors between the probability mass functions (PMFs) of the actual data and the PMFs of the data generated using the CTGAN can be relatively small. Then, from the optimized CTGAN, we generated synthetic insider threat data and combined them with the actual ones to balance the original dataset. We used the resulting dataset for an intrusion detection system implemented with the Adversarial Environment Reinforcement Learning (AE-RL) algorithm in a multi-agent framework formed by an attacker and a defender. We showed that the performance of detecting intrusions using the framework of the CTGAN and the AE-RL is significantly improved with respect to the case where the dataset is not balanced, giving an F1-score of 0.7617.

查看原文本刊更多论文

基于深度强化学习优化的多智能体入侵检测系统，并使用生成模型扩大数据集以减少偏差效应

当入侵检测系统被攻击数据和非攻击数据不平衡的数据集所调整时，入侵检测系统的性能会出现缺陷。大多数数据集包含的非攻击数据比攻击数据多，这种情况会给入侵检测系统带来偏差，使它们容易受到网络攻击。作为解决这个问题的方法，我们考虑了条件表格生成对抗网络(CTGAN)，其超参数使用树结构Parzen估计器(TPE)进行优化，以平衡内部威胁表格数据集CMU-CERT，该数据集由离散值列和连续值列组成。我们通过这种方法证明了实际数据的概率质量函数(PMFs)与使用CTGAN生成的数据的PMFs之间的平均绝对误差可以相对较小。然后，从优化后的CTGAN中生成合成的内部威胁数据，并将其与实际数据相结合，平衡原始数据集。我们将结果数据集用于在由攻击者和防御者组成的多代理框架中使用对抗环境强化学习(AE-RL)算法实现的入侵检测系统。我们发现，在数据集不平衡的情况下，使用CTGAN和AE-RL框架检测入侵的性能显着提高，f1得分为0.7617。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Sensor and Actuator Networks Physics and Astronomy-Instrumentation

CiteScore

7.90

自引率

2.90%

发文量

审稿时长

11 weeks

期刊介绍： Journal of Sensor and Actuator Networks (ISSN 2224-2708) is an international open access journal on the science and technology of sensor and actuator networks. It publishes regular research papers, reviews (including comprehensive reviews on complete sensor and actuator networks), and short communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.