S2R-CMI: A Robust Deep Reinforcement Learning method based on counterfactual estimation and state importance evaluation under additive noise disturbance
IF 5.5 2区 计算机科学Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"S2R-CMI: A Robust Deep Reinforcement Learning method based on counterfactual estimation and state importance evaluation under additive noise disturbance","authors":"Zhenyuan Chen , Zhi Zheng , Wenjun Huang , Xiaomin Lin","doi":"10.1016/j.neucom.2025.130642","DOIUrl":null,"url":null,"abstract":"<div><div>The development of Deep Reinforcement Learning (DRL) overcomes the limitations of traditional reinforcement learning in discrete spaces, extending its applications to scenarios in continuous spaces. However, disturbance commonly encountered in real-world environments poses significant threats to the performance of DRL algorithms, potentially leading to erroneous decision-making by agents and severe consequences. To address this issue, this paper proposes a Robust Deep Reinforcement Learning (RDRL) method named S2R-CMI, which aims to mitigate the impact of additive noise in the state space on DRL performance without requiring prior knowledge of the noise disturbance. Specifically, a state-based and reward-based conditional mutual information mechanism is designed to dynamically capture state importance and estimate its contribution to rewards. To address the lack of counterfactual data during training, a counterfactual label estimation method is proposed to approximate the counterfactual reward distribution while avoiding local optima during network training. State importance is then evaluated to quantify the impact of disturbance on states. Finally, we validate the proposed method in five scenarios: Cartpole, LunarLander-Discrete, LunarLander-Continuous, Build-Marine, and Half-Cheetah, under state disturbance. Experimental results demonstrate that S2R-CMI significantly enhances the robustness of DRL algorithms. Furthermore, we conduct experiments in some scenarios without state disturbance, and the results indicate that the method also achieves strong performance, further verifying its superiority and generalization capabilities.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130642"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225013141","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The development of Deep Reinforcement Learning (DRL) overcomes the limitations of traditional reinforcement learning in discrete spaces, extending its applications to scenarios in continuous spaces. However, disturbance commonly encountered in real-world environments poses significant threats to the performance of DRL algorithms, potentially leading to erroneous decision-making by agents and severe consequences. To address this issue, this paper proposes a Robust Deep Reinforcement Learning (RDRL) method named S2R-CMI, which aims to mitigate the impact of additive noise in the state space on DRL performance without requiring prior knowledge of the noise disturbance. Specifically, a state-based and reward-based conditional mutual information mechanism is designed to dynamically capture state importance and estimate its contribution to rewards. To address the lack of counterfactual data during training, a counterfactual label estimation method is proposed to approximate the counterfactual reward distribution while avoiding local optima during network training. State importance is then evaluated to quantify the impact of disturbance on states. Finally, we validate the proposed method in five scenarios: Cartpole, LunarLander-Discrete, LunarLander-Continuous, Build-Marine, and Half-Cheetah, under state disturbance. Experimental results demonstrate that S2R-CMI significantly enhances the robustness of DRL algorithms. Furthermore, we conduct experiments in some scenarios without state disturbance, and the results indicate that the method also achieves strong performance, further verifying its superiority and generalization capabilities.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.