Using Deep Reinforcement Learning for Assessing the Consequences of Cyber Mitigation Techniques on Industrial Control Systems

International Conference on Cyber Warfare and Security Pub Date : 2023-02-28 DOI:10.34190/iccws.18.1.1063

Terry Merz, Romarie Morales Rosado

{"title":"Using Deep Reinforcement Learning for Assessing the Consequences of Cyber Mitigation Techniques on Industrial Control Systems","authors":"Terry Merz, Romarie Morales Rosado","doi":"10.34190/iccws.18.1.1063","DOIUrl":null,"url":null,"abstract":"This paper discusses an in-progress study involving the use of deep reinforcement learning (DRL) to mitigate the effects of an advanced cyber-attack against industrial control systems (ICS). The research is a qualitative, exploratory study which emerged as a gap during the execution of two rapid prototyping studies. During these studies, cyber defensive procedures, known as “Mitigation, were characterized as actions taken to minimize the impact of ongoing advanced cyber-attacks against an ICS while enabling primary operations to continue. To execute Mitigation procedures, affected ICS components required rapid isolation and quarantining from “healthy” system segments. However today, with most attacks leveraging automation, mitigation also requires rapid decision-making capabilities operating at the speed of automation yet with human-like refinement. The authors settled on the choice of DRL as a viable solution to this problem due to the algorithm’s designs which involves “intelligent” decisions based upon continuous learning achieved through a rewards system. The primary theory of this study posits that processes informed by data sources relative to the execution path of an advanced cyber-attack as well as the consequences of deploying a particular Mitigation procedure evolve the system into an ever-improving defensive capability. This study seeks to produce a defensive DLR based software agent trained by a DRL based offensive software agent that generates policy refinements based upon extrapolations from a corrupted network state as reported by an IDS and baseline data. Results include an estimation rule that would quantify impacts of various mitigation actions while protecting the operational critical path and isolating an in-progress attack. This study is in a conceptual phase and development has not started. \nThis research questions for this study are: \nRQ1: Can this software agent categorize correctly an in-progress cyber-attack and extrapolate the potential ICS assets affected? \nRQ2: Can this software agent categorize novel cyber-attacks and extrapolate a probable attack vector while enumerating affected assets? \nRQ3: Can this software agent characterize how operations are affected by quarantine actions? \nRQ4: Can this software agent generate a set of ranked recommended courses of action by effectiveness, and least negative effects on the operational critical path?","PeriodicalId":429427,"journal":{"name":"International Conference on Cyber Warfare and Security","volume":"28 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Cyber Warfare and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34190/iccws.18.1.1063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper discusses an in-progress study involving the use of deep reinforcement learning (DRL) to mitigate the effects of an advanced cyber-attack against industrial control systems (ICS). The research is a qualitative, exploratory study which emerged as a gap during the execution of two rapid prototyping studies. During these studies, cyber defensive procedures, known as “Mitigation, were characterized as actions taken to minimize the impact of ongoing advanced cyber-attacks against an ICS while enabling primary operations to continue. To execute Mitigation procedures, affected ICS components required rapid isolation and quarantining from “healthy” system segments. However today, with most attacks leveraging automation, mitigation also requires rapid decision-making capabilities operating at the speed of automation yet with human-like refinement. The authors settled on the choice of DRL as a viable solution to this problem due to the algorithm’s designs which involves “intelligent” decisions based upon continuous learning achieved through a rewards system. The primary theory of this study posits that processes informed by data sources relative to the execution path of an advanced cyber-attack as well as the consequences of deploying a particular Mitigation procedure evolve the system into an ever-improving defensive capability. This study seeks to produce a defensive DLR based software agent trained by a DRL based offensive software agent that generates policy refinements based upon extrapolations from a corrupted network state as reported by an IDS and baseline data. Results include an estimation rule that would quantify impacts of various mitigation actions while protecting the operational critical path and isolating an in-progress attack. This study is in a conceptual phase and development has not started. This research questions for this study are: RQ1: Can this software agent categorize correctly an in-progress cyber-attack and extrapolate the potential ICS assets affected? RQ2: Can this software agent categorize novel cyber-attacks and extrapolate a probable attack vector while enumerating affected assets? RQ3: Can this software agent characterize how operations are affected by quarantine actions? RQ4: Can this software agent generate a set of ranked recommended courses of action by effectiveness, and least negative effects on the operational critical path?

查看原文本刊更多论文

使用深度强化学习评估网络缓解技术对工业控制系统的影响

本文讨论了一项正在进行的研究，涉及使用深度强化学习(DRL)来减轻对工业控制系统(ICS)的高级网络攻击的影响。本研究是一个定性的、探索性的研究，在两个快速原型研究的执行过程中出现了一个空白。在这些研究中，网络防御程序被称为“缓解”，其特征是采取行动，以尽量减少对ICS进行的高级网络攻击的影响，同时使主要业务能够继续进行。要执行缓解程序，需要将受影响的ICS组件与“健康”系统段快速隔离和隔离。然而，今天，随着大多数攻击利用自动化，缓解也需要快速的决策能力，以自动化的速度运行，并具有类似人类的改进。作者决定选择DRL作为这个问题的可行解决方案，因为该算法的设计涉及基于通过奖励系统实现的持续学习的“智能”决策。本研究的主要理论假设，与高级网络攻击的执行路径相关的数据源所告知的过程，以及部署特定缓解程序的后果，将使系统发展成为不断改进的防御能力。本研究旨在生成一个基于DRL的防御性软件代理，该软件代理由基于DRL的进攻性软件代理训练，该软件代理根据IDS和基线数据报告的损坏网络状态的推断生成策略改进。结果包括一个评估规则，该规则将量化各种缓解措施的影响，同时保护可操作的关键路径并隔离正在进行的攻击。这项研究还处于概念阶段，开发还没有开始。本研究的研究问题是:RQ1:该软件代理能否正确地对正在进行的网络攻击进行分类并推断受影响的潜在ICS资产?RQ2:该软件代理能否对新型网络攻击进行分类，并在列举受影响资产的同时推断出可能的攻击向量?RQ3:此软件代理能否描述隔离操作对操作的影响?RQ4:这个软件代理可以根据有效性和对操作关键路径的最小负面影响生成一组排名推荐的行动方案吗?

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Cyber Warfare and Security

自引率

0.00%

发文量