Causal Genetic Network Anomaly Detection Method for Imbalanced Data and Information Redundancy

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management Pub Date : 2024-09-06 DOI:10.1109/TNSM.2024.3455768

Zengri Zeng;Xuhui Liu;Ming Dai;Jian Zheng;Xiaoheng Deng;Detian Zeng;Jie Chen

{"title":"Causal Genetic Network Anomaly Detection Method for Imbalanced Data and Information Redundancy","authors":"Zengri Zeng;Xuhui Liu;Ming Dai;Jian Zheng;Xiaoheng Deng;Detian Zeng;Jie Chen","doi":"10.1109/TNSM.2024.3455768","DOIUrl":null,"url":null,"abstract":"The proliferation of Internet-connected devices and the complexity of modern network environments have led to the collection of massive and high-dimensional datasets, resulting in substantial information redundancy and sample imbalance issues. These challenges not only hinder the computational efficiency and generalizability of anomaly detection systems but also compromise their ability to detect rare attack types, posing significant security threats. To address these pressing issues, we propose a novel causal genetic network-based anomaly detection method, the CNSGA, which integrates causal inference and the nondominated sorting genetic algorithm-III (NSGA-III). The CNSGA leverages causal reasoning to exclude irrelevant information, focusing solely on the features that are causally related to the outcome labels. Simultaneously, NSGA-III iteratively eliminates redundant information and prioritizes minority samples, thereby enhancing detection performance. To quantitatively assess the improvements achieved, we introduce two indices: a detection balance index and an optimal feature subset index. These indices, along with the causal effect weights, serve as fitness metrics for iterative optimization. The optimized individuals are then selected for subsequent population generation on the basis of nondominated reference point ordering. The experimental results obtained with four real-world network attack datasets demonstrate that the CNSGA significantly outperforms existing methods in terms of overall precision, the imbalance index, and the optimal feature subset index, with maximum increases exceeding 10%, 0.5, and 50%, respectively. Notably, for the CICDDoS2019 dataset, the CNSGA requires only 16-dimensional features to effectively detect more than 70% of all sample types, including 6 more network attack sample types than the other methods detect. The significance and impact of this work encompass the ability to eliminate redundant information, increase detection rates, balance attack detection systems, and ensure stability and generalizability. The proposed CNSGA framework represents a significant step forward in developing efficient and accurate anomaly detection systems capable of defending against a wide range of cyber threats in complex network environments.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6937-6952"},"PeriodicalIF":4.7000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10668849/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The proliferation of Internet-connected devices and the complexity of modern network environments have led to the collection of massive and high-dimensional datasets, resulting in substantial information redundancy and sample imbalance issues. These challenges not only hinder the computational efficiency and generalizability of anomaly detection systems but also compromise their ability to detect rare attack types, posing significant security threats. To address these pressing issues, we propose a novel causal genetic network-based anomaly detection method, the CNSGA, which integrates causal inference and the nondominated sorting genetic algorithm-III (NSGA-III). The CNSGA leverages causal reasoning to exclude irrelevant information, focusing solely on the features that are causally related to the outcome labels. Simultaneously, NSGA-III iteratively eliminates redundant information and prioritizes minority samples, thereby enhancing detection performance. To quantitatively assess the improvements achieved, we introduce two indices: a detection balance index and an optimal feature subset index. These indices, along with the causal effect weights, serve as fitness metrics for iterative optimization. The optimized individuals are then selected for subsequent population generation on the basis of nondominated reference point ordering. The experimental results obtained with four real-world network attack datasets demonstrate that the CNSGA significantly outperforms existing methods in terms of overall precision, the imbalance index, and the optimal feature subset index, with maximum increases exceeding 10%, 0.5, and 50%, respectively. Notably, for the CICDDoS2019 dataset, the CNSGA requires only 16-dimensional features to effectively detect more than 70% of all sample types, including 6 more network attack sample types than the other methods detect. The significance and impact of this work encompass the ability to eliminate redundant information, increase detection rates, balance attack detection systems, and ensure stability and generalizability. The proposed CNSGA framework represents a significant step forward in developing efficient and accurate anomaly detection systems capable of defending against a wide range of cyber threats in complex network environments.

查看原文本刊更多论文

针对不平衡数据和信息冗余的因果遗传网络异常现象检测方法

互联网连接设备的激增和现代网络环境的复杂性导致大量高维数据集的收集，导致大量的信息冗余和样本不平衡问题。这些挑战不仅阻碍了异常检测系统的计算效率和通用性，而且损害了它们检测罕见攻击类型的能力，构成了重大的安全威胁。为了解决这些紧迫的问题，我们提出了一种新的基于因果遗传网络的异常检测方法——CNSGA，它集成了因果推理和非主导排序遗传算法- iii （NSGA-III）。CNSGA利用因果推理来排除不相关的信息，只关注与结果标签有因果关系的特征。同时，NSGA-III迭代剔除冗余信息，对少数样本进行优先排序，提高检测性能。为了定量评估所取得的改进，我们引入了两个指标：检测平衡指标和最优特征子集指标。这些指标与因果效应权重一起作为迭代优化的适应度指标。然后在非支配参考点排序的基础上选择优化后的个体进行后续种群生成。在4个真实网络攻击数据集上的实验结果表明，CNSGA在总体精度、不平衡指数和最优特征子集指数上均显著优于现有方法，最大增幅分别超过10%、0.5和50%。值得注意的是，对于CICDDoS2019数据集，CNSGA仅需要16维特征即可有效检测70%以上的样本类型，其中网络攻击样本类型比其他方法检测的多6种。这项工作的意义和影响包括消除冗余信息、提高检测率、平衡攻击检测系统以及确保稳定性和通用性的能力。提出的CNSGA框架代表了在开发高效、准确的异常检测系统方面迈出的重要一步，该系统能够在复杂的网络环境中防御各种网络威胁。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

15.10%

发文量

325

期刊介绍： IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.