Raiju：强化学习引导的网络系统自动安全评估后开发

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computer Networks Pub Date : 2024-08-09 DOI:10.1016/j.comnet.2024.110706

{"title":"Raiju：强化学习引导的网络系统自动安全评估后开发","authors":"","doi":"10.1016/j.comnet.2024.110706","DOIUrl":null,"url":null,"abstract":"<div><p>To discover threats to a network system, investigating the behaviors of attackers after successful exploitation is an important phase, called post-exploitation. Although various efficient tools support post-exploitation implementation, the crucial factor in completing this process remains experienced human experts, known as penetration testers or pen-testers. This study proposes the Raiju framework, a Reinforcement Learning (RL)-driven automation approach, which automatically implements steps of the post-exploitation phase for security-level evaluation. We implement two well-known RL algorithms, Advantage Actor–Critic (A2C) and Proximal Policy Optimization (PPO), to evaluate specialized agents capable of making intelligent actions. With the support of Metasploit, modules corresponding to selected actions of the agent automatically launch real attacks of privileges escalation (PE), gathering hashdump (GH), and lateral movement (LM) on multiple platforms. Through leveraging RL, our objective is to empower agents that can autonomously select suitable actions to exploit vulnerabilities within target systems. This approach enables the automation of specific components within the penetration testing (PT) workflow, thereby enhancing its efficiency and adaptability to evolving threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents can automatically launch exploits on the four environments and achieve a success ratio of over 84% across the three attack types. Furthermore, our experiments demonstrate the remarkable effectiveness of the A2C algorithm in the realm of post-exploitation automation.</p></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Raiju: Reinforcement learning-guided post-exploitation for automating security assessment of network systems\",\"authors\":\"\",\"doi\":\"10.1016/j.comnet.2024.110706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>To discover threats to a network system, investigating the behaviors of attackers after successful exploitation is an important phase, called post-exploitation. Although various efficient tools support post-exploitation implementation, the crucial factor in completing this process remains experienced human experts, known as penetration testers or pen-testers. This study proposes the Raiju framework, a Reinforcement Learning (RL)-driven automation approach, which automatically implements steps of the post-exploitation phase for security-level evaluation. We implement two well-known RL algorithms, Advantage Actor–Critic (A2C) and Proximal Policy Optimization (PPO), to evaluate specialized agents capable of making intelligent actions. With the support of Metasploit, modules corresponding to selected actions of the agent automatically launch real attacks of privileges escalation (PE), gathering hashdump (GH), and lateral movement (LM) on multiple platforms. Through leveraging RL, our objective is to empower agents that can autonomously select suitable actions to exploit vulnerabilities within target systems. This approach enables the automation of specific components within the penetration testing (PT) workflow, thereby enhancing its efficiency and adaptability to evolving threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents can automatically launch exploits on the four environments and achieve a success ratio of over 84% across the three attack types. Furthermore, our experiments demonstrate the remarkable effectiveness of the A2C algorithm in the realm of post-exploitation automation.</p></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128624005383\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624005383","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

要发现对网络系统的威胁，调查攻击者在成功利用后的行为是一个重要阶段，称为 "后利用"。尽管各种高效工具支持后开发的实施，但完成这一过程的关键因素仍然是经验丰富的人类专家，即所谓的渗透测试人员或笔试人员。本研究提出的 Raiju 框架是一种强化学习（RL）驱动的自动化方法，可自动执行安全级别评估的后开发阶段步骤。我们实施了两种著名的强化学习算法，即优势行为批判（A2C）和近端策略优化（PPO），以评估能够采取智能行动的专门代理。在 Metasploit 的支持下，与代理的选定行动相对应的模块会在多个平台上自动发起权限升级（PE）、收集 hashdump（GH）和横向移动（LM）等真实攻击。通过利用 RL，我们的目标是使代理能够自主选择合适的行动，利用目标系统中的漏洞。这种方法可以实现渗透测试（PT）工作流程中特定组件的自动化，从而提高其效率和对不断变化的威胁和漏洞的适应性。实验在四个真实环境中进行，代理经过数千次训练。代理可以在四个环境中自动发起攻击，在三种攻击类型中的成功率超过 84%。此外，我们的实验还证明了 A2C 算法在漏洞利用后自动化领域的显著效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Raiju: Reinforcement learning-guided post-exploitation for automating security assessment of network systems

To discover threats to a network system, investigating the behaviors of attackers after successful exploitation is an important phase, called post-exploitation. Although various efficient tools support post-exploitation implementation, the crucial factor in completing this process remains experienced human experts, known as penetration testers or pen-testers. This study proposes the Raiju framework, a Reinforcement Learning (RL)-driven automation approach, which automatically implements steps of the post-exploitation phase for security-level evaluation. We implement two well-known RL algorithms, Advantage Actor–Critic (A2C) and Proximal Policy Optimization (PPO), to evaluate specialized agents capable of making intelligent actions. With the support of Metasploit, modules corresponding to selected actions of the agent automatically launch real attacks of privileges escalation (PE), gathering hashdump (GH), and lateral movement (LM) on multiple platforms. Through leveraging RL, our objective is to empower agents that can autonomously select suitable actions to exploit vulnerabilities within target systems. This approach enables the automation of specific components within the penetration testing (PT) workflow, thereby enhancing its efficiency and adaptability to evolving threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents can automatically launch exploits on the four environments and achieve a success ratio of over 84% across the three attack types. Furthermore, our experiments demonstrate the remarkable effectiveness of the A2C algorithm in the realm of post-exploitation automation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Networks 工程技术-电信学

CiteScore

10.80

自引率

3.60%

发文量

434

审稿时长

8.6 months

期刊介绍： Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.