Using Deep Reinforcement Learning for Assessing the Consequences of Cyber Mitigation Techniques on Industrial Control Systems

Terry Merz, Romarie Morales Rosado
{"title":"Using Deep Reinforcement Learning for Assessing the Consequences of Cyber Mitigation Techniques on Industrial Control Systems","authors":"Terry Merz, Romarie Morales Rosado","doi":"10.34190/iccws.18.1.1063","DOIUrl":null,"url":null,"abstract":"This paper discusses an in-progress study involving the use of deep reinforcement learning (DRL) to mitigate the effects of an advanced cyber-attack against industrial control systems (ICS).  The research is a qualitative, exploratory study which emerged as a gap during the execution of two rapid prototyping studies.  During these studies, cyber defensive procedures, known as “Mitigation, were characterized as actions taken to minimize the impact of ongoing advanced cyber-attacks against an ICS while enabling primary operations to continue.  To execute Mitigation procedures, affected ICS components required rapid isolation and quarantining from “healthy” system segments. However today, with most attacks leveraging automation, mitigation also requires rapid decision-making capabilities operating at the speed of automation yet with human-like refinement.  The authors settled on the choice of DRL as a viable solution to this problem due to the algorithm’s designs which involves “intelligent” decisions based upon continuous learning achieved through a rewards system.  The primary theory of this study posits that processes informed by data sources relative to the execution path of an advanced cyber-attack as well as the consequences of deploying a particular Mitigation procedure evolve the system into an ever-improving defensive capability.  This study seeks to produce a defensive DLR based software agent trained by a DRL based offensive software agent that generates policy refinements based upon extrapolations from a corrupted network state as reported by an IDS and baseline data. Results include an estimation rule that would quantify impacts of various mitigation actions while protecting the operational critical path and isolating an in-progress attack.  This study is in a conceptual phase and development has not started. \nThis research questions for this study are: \nRQ1: Can this software agent categorize correctly an in-progress cyber-attack and extrapolate the potential ICS assets affected? \nRQ2: Can this software agent categorize novel cyber-attacks and extrapolate a probable attack vector while enumerating affected assets? \nRQ3: Can this software agent characterize how operations are affected by quarantine actions? \nRQ4: Can this software agent generate a set of ranked recommended courses of action by effectiveness, and least negative effects on the operational critical path?","PeriodicalId":429427,"journal":{"name":"International Conference on Cyber Warfare and Security","volume":"28 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Cyber Warfare and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34190/iccws.18.1.1063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper discusses an in-progress study involving the use of deep reinforcement learning (DRL) to mitigate the effects of an advanced cyber-attack against industrial control systems (ICS).  The research is a qualitative, exploratory study which emerged as a gap during the execution of two rapid prototyping studies.  During these studies, cyber defensive procedures, known as “Mitigation, were characterized as actions taken to minimize the impact of ongoing advanced cyber-attacks against an ICS while enabling primary operations to continue.  To execute Mitigation procedures, affected ICS components required rapid isolation and quarantining from “healthy” system segments. However today, with most attacks leveraging automation, mitigation also requires rapid decision-making capabilities operating at the speed of automation yet with human-like refinement.  The authors settled on the choice of DRL as a viable solution to this problem due to the algorithm’s designs which involves “intelligent” decisions based upon continuous learning achieved through a rewards system.  The primary theory of this study posits that processes informed by data sources relative to the execution path of an advanced cyber-attack as well as the consequences of deploying a particular Mitigation procedure evolve the system into an ever-improving defensive capability.  This study seeks to produce a defensive DLR based software agent trained by a DRL based offensive software agent that generates policy refinements based upon extrapolations from a corrupted network state as reported by an IDS and baseline data. Results include an estimation rule that would quantify impacts of various mitigation actions while protecting the operational critical path and isolating an in-progress attack.  This study is in a conceptual phase and development has not started. This research questions for this study are: RQ1: Can this software agent categorize correctly an in-progress cyber-attack and extrapolate the potential ICS assets affected? RQ2: Can this software agent categorize novel cyber-attacks and extrapolate a probable attack vector while enumerating affected assets? RQ3: Can this software agent characterize how operations are affected by quarantine actions? RQ4: Can this software agent generate a set of ranked recommended courses of action by effectiveness, and least negative effects on the operational critical path?
使用深度强化学习评估网络缓解技术对工业控制系统的影响
本文讨论了一项正在进行的研究,涉及使用深度强化学习(DRL)来减轻对工业控制系统(ICS)的高级网络攻击的影响。本研究是一个定性的、探索性的研究,在两个快速原型研究的执行过程中出现了一个空白。在这些研究中,网络防御程序被称为“缓解”,其特征是采取行动,以尽量减少对ICS进行的高级网络攻击的影响,同时使主要业务能够继续进行。要执行缓解程序,需要将受影响的ICS组件与“健康”系统段快速隔离和隔离。然而,今天,随着大多数攻击利用自动化,缓解也需要快速的决策能力,以自动化的速度运行,并具有类似人类的改进。作者决定选择DRL作为这个问题的可行解决方案,因为该算法的设计涉及基于通过奖励系统实现的持续学习的“智能”决策。本研究的主要理论假设,与高级网络攻击的执行路径相关的数据源所告知的过程,以及部署特定缓解程序的后果,将使系统发展成为不断改进的防御能力。本研究旨在生成一个基于DRL的防御性软件代理,该软件代理由基于DRL的进攻性软件代理训练,该软件代理根据IDS和基线数据报告的损坏网络状态的推断生成策略改进。结果包括一个评估规则,该规则将量化各种缓解措施的影响,同时保护可操作的关键路径并隔离正在进行的攻击。这项研究还处于概念阶段,开发还没有开始。本研究的研究问题是:RQ1:该软件代理能否正确地对正在进行的网络攻击进行分类并推断受影响的潜在ICS资产?RQ2:该软件代理能否对新型网络攻击进行分类,并在列举受影响资产的同时推断出可能的攻击向量?RQ3:此软件代理能否描述隔离操作对操作的影响?RQ4:这个软件代理可以根据有效性和对操作关键路径的最小负面影响生成一组排名推荐的行动方案吗?
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信