S2R-CMI: A Robust Deep Reinforcement Learning method based on counterfactual estimation and state importance evaluation under additive noise disturbance

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-06-13 DOI:10.1016/j.neucom.2025.130642

Zhenyuan Chen , Zhi Zheng , Wenjun Huang , Xiaomin Lin

{"title":"S2R-CMI: A Robust Deep Reinforcement Learning method based on counterfactual estimation and state importance evaluation under additive noise disturbance","authors":"Zhenyuan Chen , Zhi Zheng , Wenjun Huang , Xiaomin Lin","doi":"10.1016/j.neucom.2025.130642","DOIUrl":null,"url":null,"abstract":"<div><div>The development of Deep Reinforcement Learning (DRL) overcomes the limitations of traditional reinforcement learning in discrete spaces, extending its applications to scenarios in continuous spaces. However, disturbance commonly encountered in real-world environments poses significant threats to the performance of DRL algorithms, potentially leading to erroneous decision-making by agents and severe consequences. To address this issue, this paper proposes a Robust Deep Reinforcement Learning (RDRL) method named S2R-CMI, which aims to mitigate the impact of additive noise in the state space on DRL performance without requiring prior knowledge of the noise disturbance. Specifically, a state-based and reward-based conditional mutual information mechanism is designed to dynamically capture state importance and estimate its contribution to rewards. To address the lack of counterfactual data during training, a counterfactual label estimation method is proposed to approximate the counterfactual reward distribution while avoiding local optima during network training. State importance is then evaluated to quantify the impact of disturbance on states. Finally, we validate the proposed method in five scenarios: Cartpole, LunarLander-Discrete, LunarLander-Continuous, Build-Marine, and Half-Cheetah, under state disturbance. Experimental results demonstrate that S2R-CMI significantly enhances the robustness of DRL algorithms. Furthermore, we conduct experiments in some scenarios without state disturbance, and the results indicate that the method also achieves strong performance, further verifying its superiority and generalization capabilities.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130642"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225013141","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The development of Deep Reinforcement Learning (DRL) overcomes the limitations of traditional reinforcement learning in discrete spaces, extending its applications to scenarios in continuous spaces. However, disturbance commonly encountered in real-world environments poses significant threats to the performance of DRL algorithms, potentially leading to erroneous decision-making by agents and severe consequences. To address this issue, this paper proposes a Robust Deep Reinforcement Learning (RDRL) method named S2R-CMI, which aims to mitigate the impact of additive noise in the state space on DRL performance without requiring prior knowledge of the noise disturbance. Specifically, a state-based and reward-based conditional mutual information mechanism is designed to dynamically capture state importance and estimate its contribution to rewards. To address the lack of counterfactual data during training, a counterfactual label estimation method is proposed to approximate the counterfactual reward distribution while avoiding local optima during network training. State importance is then evaluated to quantify the impact of disturbance on states. Finally, we validate the proposed method in five scenarios: Cartpole, LunarLander-Discrete, LunarLander-Continuous, Build-Marine, and Half-Cheetah, under state disturbance. Experimental results demonstrate that S2R-CMI significantly enhances the robustness of DRL algorithms. Furthermore, we conduct experiments in some scenarios without state disturbance, and the results indicate that the method also achieves strong performance, further verifying its superiority and generalization capabilities.

查看原文本刊更多论文

S2R-CMI：一种加性噪声干扰下基于反事实估计和状态重要性评估的鲁棒深度强化学习方法

深度强化学习（Deep Reinforcement Learning， DRL）的发展克服了传统强化学习在离散空间中的局限性，将其应用扩展到连续空间的场景中。然而，在现实环境中经常遇到的干扰对DRL算法的性能构成了重大威胁，可能导致智能体的错误决策和严重后果。为了解决这一问题，本文提出了一种名为S2R-CMI的鲁棒深度强化学习（RDRL）方法，该方法旨在减轻状态空间中加性噪声对DRL性能的影响，而无需事先了解噪声干扰。具体来说，设计了一种基于状态和基于奖励的条件互信息机制来动态捕捉状态的重要性并估计其对奖励的贡献。针对训练过程中缺乏反事实数据的问题，提出了一种反事实标签估计方法，在逼近反事实奖励分布的同时避免了网络训练过程中的局部最优。然后评估状态的重要性，以量化扰动对状态的影响。最后，在状态干扰下，对Cartpole、LunarLander-Discrete、LunarLander-Continuous、builder - marine和Half-Cheetah五种场景进行了验证。实验结果表明，S2R-CMI显著提高了DRL算法的鲁棒性。此外，我们在一些无状态干扰的场景下进行了实验，结果表明该方法也取得了较强的性能，进一步验证了其优越性和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.