{"title":"一种检测高级持续威胁的智能博弈论框架","authors":"Hao Yan, Qianzhen Zhang, Junjie Xie, Ziyue Lu, Sheng Chen, Deke Guo","doi":"10.1109/ICPADS53394.2021.00062","DOIUrl":null,"url":null,"abstract":"The advanced persistent threat (APT) is a stealthy cyber attack perpetrated by a group that gains unauthorized access to a computer network and remains undiscovered to steal specific data and resources. Fast detection and defense of APT attacks are critical tasks in cyber security. Previous works use simple feature extraction and classification methods to distinguish APT information flow from the normal one. However, APT attacks are latent, with very little flow and mixed in many normal information flows. Moreover, APT attacks can adjust their behavior according to the environment, making it challenging to be discovered and extract features. Meanwhile, dynamic information flow tracking (DIFT) is a tool for tracking information flow, which can also adjust the marking strategy according to the environment and is often used to track and detect APT information flow. On the other hand, game theory is a mathematical model that expresses the game of two or more parties. Therefore, this motivates us to model a game theory to solve the above challenge. In this paper, to solve the above obstacles, we propose an intelligent game theory framework named DPS, which models the strategic interaction between APTs and DIFT and aims to get a high reward for DIFT. Our proposed DPS framework utilizes deep reinforcement learning to find the Nash equilibrium. The game model is a nonzero-sum, average reward stochastic game. Specifically, we design a subgraph pruning strategy and deep Q-network to guide the player in exploring new strategies in the information flow graph. Finally, we implement our framework to compute an optimal defender strategy to defend cyber security. Based on 2 real-world datasets, the experiment results demonstrate that the DPS framework can delay APT intrusions under equilibrium in 3 epochs and get a better reward than the Uniform policy.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"52 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Intelligent Game Theory Framework for Detecting Advanced Persistent Threats\",\"authors\":\"Hao Yan, Qianzhen Zhang, Junjie Xie, Ziyue Lu, Sheng Chen, Deke Guo\",\"doi\":\"10.1109/ICPADS53394.2021.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advanced persistent threat (APT) is a stealthy cyber attack perpetrated by a group that gains unauthorized access to a computer network and remains undiscovered to steal specific data and resources. Fast detection and defense of APT attacks are critical tasks in cyber security. Previous works use simple feature extraction and classification methods to distinguish APT information flow from the normal one. However, APT attacks are latent, with very little flow and mixed in many normal information flows. Moreover, APT attacks can adjust their behavior according to the environment, making it challenging to be discovered and extract features. Meanwhile, dynamic information flow tracking (DIFT) is a tool for tracking information flow, which can also adjust the marking strategy according to the environment and is often used to track and detect APT information flow. On the other hand, game theory is a mathematical model that expresses the game of two or more parties. Therefore, this motivates us to model a game theory to solve the above challenge. In this paper, to solve the above obstacles, we propose an intelligent game theory framework named DPS, which models the strategic interaction between APTs and DIFT and aims to get a high reward for DIFT. Our proposed DPS framework utilizes deep reinforcement learning to find the Nash equilibrium. The game model is a nonzero-sum, average reward stochastic game. Specifically, we design a subgraph pruning strategy and deep Q-network to guide the player in exploring new strategies in the information flow graph. Finally, we implement our framework to compute an optimal defender strategy to defend cyber security. Based on 2 real-world datasets, the experiment results demonstrate that the DPS framework can delay APT intrusions under equilibrium in 3 epochs and get a better reward than the Uniform policy.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"52 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
高级持续性威胁(APT)是一种隐蔽的网络攻击,由一个组织未经授权访问计算机网络,并在不被发现的情况下窃取特定数据和资源。快速检测和防御APT攻击是网络安全的关键任务。以往的工作使用简单的特征提取和分类方法来区分APT信息流和正常信息流。然而,APT攻击是潜在的,流量很小,并且混合在许多正常的信息流中。此外,APT攻击可以根据环境调整自己的行为,这给发现和提取特征带来了挑战。同时,动态信息流跟踪(dynamic information flow tracking, DIFT)是一种跟踪信息流的工具,它还可以根据环境调整标记策略,常用于跟踪和检测APT信息流。另一方面,博弈论是表达两方或多方博弈的数学模型。因此,这促使我们建立一个博弈论模型来解决上述挑战。为了解决上述障碍,本文提出了一个名为DPS的智能博弈论框架,该框架对apt和DIFT之间的战略互动进行建模,旨在为DIFT获得高回报。我们提出的DPS框架利用深度强化学习来寻找纳什均衡。游戏模型是非零和、平均奖励的随机游戏。具体来说,我们设计了子图修剪策略和深度q网络来指导玩家在信息流图中探索新的策略。最后,我们实现了我们的框架来计算最优防御策略来防御网络安全。基于2个真实数据集的实验结果表明,DPS框架可以在3个epoch的均衡状态下延迟APT入侵,并且比统一策略获得更好的奖励。
An Intelligent Game Theory Framework for Detecting Advanced Persistent Threats
The advanced persistent threat (APT) is a stealthy cyber attack perpetrated by a group that gains unauthorized access to a computer network and remains undiscovered to steal specific data and resources. Fast detection and defense of APT attacks are critical tasks in cyber security. Previous works use simple feature extraction and classification methods to distinguish APT information flow from the normal one. However, APT attacks are latent, with very little flow and mixed in many normal information flows. Moreover, APT attacks can adjust their behavior according to the environment, making it challenging to be discovered and extract features. Meanwhile, dynamic information flow tracking (DIFT) is a tool for tracking information flow, which can also adjust the marking strategy according to the environment and is often used to track and detect APT information flow. On the other hand, game theory is a mathematical model that expresses the game of two or more parties. Therefore, this motivates us to model a game theory to solve the above challenge. In this paper, to solve the above obstacles, we propose an intelligent game theory framework named DPS, which models the strategic interaction between APTs and DIFT and aims to get a high reward for DIFT. Our proposed DPS framework utilizes deep reinforcement learning to find the Nash equilibrium. The game model is a nonzero-sum, average reward stochastic game. Specifically, we design a subgraph pruning strategy and deep Q-network to guide the player in exploring new strategies in the information flow graph. Finally, we implement our framework to compute an optimal defender strategy to defend cyber security. Based on 2 real-world datasets, the experiment results demonstrate that the DPS framework can delay APT intrusions under equilibrium in 3 epochs and get a better reward than the Uniform policy.