LLM4Game: Multi-agent reinforcement learning with knowledge injection for dynamic defense resource allocation in cloud storage

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computer Networks Pub Date : 2025-10-03 DOI:10.1016/j.comnet.2025.111748

Yixiao Peng , Hao Hu , Feiyang Li , Yingchang Jiang , Jipeng Tang , Yuling Liu

{"title":"LLM4Game: Multi-agent reinforcement learning with knowledge injection for dynamic defense resource allocation in cloud storage","authors":"Yixiao Peng , Hao Hu , Feiyang Li , Yingchang Jiang , Jipeng Tang , Yuling Liu","doi":"10.1016/j.comnet.2025.111748","DOIUrl":null,"url":null,"abstract":"<div><div>The non-cooperative and interdependent nature of network attack-defense links it closely to game theory. Current game-theoretic decision-making methods construct game models for attack-defense scenarios and use reinforcement learning (RL) to compute optimal strategies. However, RL relies on the “trial and error” exploration and is likely to fall into the local optimum in some cloud storage environment without game equilibrium. First, in cloud storage systems, the resource investment of attack and defense players has a “winner-takes-all” characteristic. Thus, we employ the Colonel Blotto game to model the attack-defense scenario in cloud storage systems, extending it to a multi-player, heterogeneous battlefield model with asymmetric resources. Second, RL’s reliance on trial-and-error exploration leads to suboptimal convergence in sparse-reward, non-equilibrium conditions. We leverage Large Language Models (LLMs) to inject attack-defense context knowledge, addressing the cold start problem of RL. Finally, we propose the RL-LLM-KI algorithm featuring a precomputation-retrieval mechanism that mitigates the inference speed discrepancy between LLMs and RL agents, enabling real-time defense decisions. Experiments show that our work increases utility by 140 % and 136.36 % compared to MADRL and DRS-DQN respectively in typical experimental scenarios. To our best knowledge, this study is the first to reveal the significant effect of knowledge injection in enhancing decision-making efficacy in highly adversarial cloud storage attack-defense scenarios.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"273 ","pages":"Article 111748"},"PeriodicalIF":4.6000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625007145","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The non-cooperative and interdependent nature of network attack-defense links it closely to game theory. Current game-theoretic decision-making methods construct game models for attack-defense scenarios and use reinforcement learning (RL) to compute optimal strategies. However, RL relies on the “trial and error” exploration and is likely to fall into the local optimum in some cloud storage environment without game equilibrium. First, in cloud storage systems, the resource investment of attack and defense players has a “winner-takes-all” characteristic. Thus, we employ the Colonel Blotto game to model the attack-defense scenario in cloud storage systems, extending it to a multi-player, heterogeneous battlefield model with asymmetric resources. Second, RL’s reliance on trial-and-error exploration leads to suboptimal convergence in sparse-reward, non-equilibrium conditions. We leverage Large Language Models (LLMs) to inject attack-defense context knowledge, addressing the cold start problem of RL. Finally, we propose the RL-LLM-KI algorithm featuring a precomputation-retrieval mechanism that mitigates the inference speed discrepancy between LLMs and RL agents, enabling real-time defense decisions. Experiments show that our work increases utility by 140 % and 136.36 % compared to MADRL and DRS-DQN respectively in typical experimental scenarios. To our best knowledge, this study is the first to reveal the significant effect of knowledge injection in enhancing decision-making efficacy in highly adversarial cloud storage attack-defense scenarios.

查看原文本刊更多论文

LLM4Game：基于知识注入的云存储动态防御资源分配多智能体强化学习

网络攻防的非合作和相互依赖的本质将其与博弈论紧密地联系在一起。当前的博弈论决策方法为攻防场景构建博弈模型，并使用强化学习（RL）来计算最优策略。然而，强化学习依赖于“试错”的探索，在一些没有博弈平衡的云存储环境中，很可能陷入局部最优。首先，在云存储系统中，攻防双方的资源投入具有“赢者通吃”的特点。因此，我们使用Colonel Blotto游戏来模拟云存储系统中的攻击防御场景，将其扩展到具有不对称资源的多玩家异构战场模型。其次，强化学习对试错探索的依赖导致了在稀疏奖励、非均衡条件下的次优收敛。我们利用大型语言模型（llm）注入攻击防御上下文知识，解决强化学习的冷启动问题。最后，我们提出了具有预计算检索机制的RL- llm - ki算法，该算法可以减轻llm和RL代理之间的推理速度差异，从而实现实时防御决策。实验表明，在典型的实验场景下，与MADRL和DRS-DQN相比，我们的工作效率分别提高了140%和136.36%。据我们所知，本研究首次揭示了在高度对抗性的云存储攻击防御场景中，知识注入在提高决策效能方面的显著作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Networks 工程技术-电信学

CiteScore

10.80

自引率

3.60%

发文量

434

审稿时长

8.6 months

期刊介绍： Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.