应用动作屏蔽和课程学习技术,利用强化学习提高操作技术网络安全的数据效率和整体性能

Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead
{"title":"应用动作屏蔽和课程学习技术,利用强化学习提高操作技术网络安全的数据效率和整体性能","authors":"Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead","doi":"arxiv-2409.10563","DOIUrl":null,"url":null,"abstract":"In previous work, the IPMSRL environment (Integrated Platform Management\nSystem Reinforcement Learning environment) was developed with the aim of\ntraining defensive RL agents in a simulator representing a subset of an IPMS on\na maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to\nenhance realism including the additional dynamics of false positive alerts and\nalert delay. Applying curriculum learning, in the most difficult environment\ntested, resulted in an episode reward mean increasing from a baseline result of\n-2.791 to -0.569. Applying action masking, in the most difficult environment\ntested, resulted in an episode reward mean increasing from a baseline result of\n-2.791 to -0.743. Importantly, this level of performance was reached in less\nthan 1 million timesteps, which was far more data efficient than vanilla PPO\nwhich reached a lower level of performance after 2.5 million timesteps. The\ntraining method which resulted in the highest level of performance observed in\nthis paper was a combination of the application of curriculum learning and\naction masking, with a mean episode reward of 0.137. This paper also introduces\na basic hardcoded defensive agent encoding a representation of cyber security\nbest practice, which provides context to the episode reward mean figures\nreached by the RL agents. The hardcoded agent managed an episode reward mean of\n-1.895. This paper therefore shows that applications of curriculum learning and\naction masking, both independently and in tandem, present a way to overcome the\ncomplex real-world dynamics that are present in operational technology cyber\nsecurity threat remediation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning\",\"authors\":\"Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead\",\"doi\":\"arxiv-2409.10563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In previous work, the IPMSRL environment (Integrated Platform Management\\nSystem Reinforcement Learning environment) was developed with the aim of\\ntraining defensive RL agents in a simulator representing a subset of an IPMS on\\na maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to\\nenhance realism including the additional dynamics of false positive alerts and\\nalert delay. Applying curriculum learning, in the most difficult environment\\ntested, resulted in an episode reward mean increasing from a baseline result of\\n-2.791 to -0.569. Applying action masking, in the most difficult environment\\ntested, resulted in an episode reward mean increasing from a baseline result of\\n-2.791 to -0.743. Importantly, this level of performance was reached in less\\nthan 1 million timesteps, which was far more data efficient than vanilla PPO\\nwhich reached a lower level of performance after 2.5 million timesteps. The\\ntraining method which resulted in the highest level of performance observed in\\nthis paper was a combination of the application of curriculum learning and\\naction masking, with a mean episode reward of 0.137. This paper also introduces\\na basic hardcoded defensive agent encoding a representation of cyber security\\nbest practice, which provides context to the episode reward mean figures\\nreached by the RL agents. The hardcoded agent managed an episode reward mean of\\n-1.895. This paper therefore shows that applications of curriculum learning and\\naction masking, both independently and in tandem, present a way to overcome the\\ncomplex real-world dynamics that are present in operational technology cyber\\nsecurity threat remediation.\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在之前的工作中,我们开发了 IPMSRL 环境(综合平台管理系统强化学习环境),目的是在模拟器中训练防御性 RL 代理,模拟器代表了受到网络攻击的海船上的 IPMS 子集。本文扩展了 IPMSRL 的使用范围,以增强其真实性,包括假警报和警报延迟的额外动态。在最困难的环境测试中,应用课程学习使每集奖励平均值从基线结果的-2.791 降至-0.569。在最困难的测试环境中,应用动作遮蔽技术可使情节奖励平均值从基线结果-2.791 增至-0.743。重要的是,这种性能水平是在不到 100 万个时间步的情况下达到的,这比在 250 万个时间步后达到较低性能水平的 vanilla PPO 要节省数据得多。本文观察到的性能水平最高的训练方法是课程学习和行动掩蔽的组合应用,平均集数奖励为 0.137。本文还引入了一个基本的硬编码防御代理,它对网络安全最佳实践进行了编码,为 RL 代理达到的情节奖励平均值提供了背景。硬编码代理管理的情节奖励平均值为 1.895。因此,本文表明,课程学习和行动掩码的应用,无论是独立应用还是串联应用,都为克服操作技术网络安全威胁补救中存在的复杂现实世界动态提供了一种方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning
In previous work, the IPMSRL environment (Integrated Platform Management System Reinforcement Learning environment) was developed with the aim of training defensive RL agents in a simulator representing a subset of an IPMS on a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to enhance realism including the additional dynamics of false positive alerts and alert delay. Applying curriculum learning, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.569. Applying action masking, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.743. Importantly, this level of performance was reached in less than 1 million timesteps, which was far more data efficient than vanilla PPO which reached a lower level of performance after 2.5 million timesteps. The training method which resulted in the highest level of performance observed in this paper was a combination of the application of curriculum learning and action masking, with a mean episode reward of 0.137. This paper also introduces a basic hardcoded defensive agent encoding a representation of cyber security best practice, which provides context to the episode reward mean figures reached by the RL agents. The hardcoded agent managed an episode reward mean of -1.895. This paper therefore shows that applications of curriculum learning and action masking, both independently and in tandem, present a way to overcome the complex real-world dynamics that are present in operational technology cyber security threat remediation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信