S2R-CMI: A Robust Deep Reinforcement Learning method based on counterfactual estimation and state importance evaluation under additive noise disturbance

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhenyuan Chen , Zhi Zheng , Wenjun Huang , Xiaomin Lin
{"title":"S2R-CMI: A Robust Deep Reinforcement Learning method based on counterfactual estimation and state importance evaluation under additive noise disturbance","authors":"Zhenyuan Chen ,&nbsp;Zhi Zheng ,&nbsp;Wenjun Huang ,&nbsp;Xiaomin Lin","doi":"10.1016/j.neucom.2025.130642","DOIUrl":null,"url":null,"abstract":"<div><div>The development of Deep Reinforcement Learning (DRL) overcomes the limitations of traditional reinforcement learning in discrete spaces, extending its applications to scenarios in continuous spaces. However, disturbance commonly encountered in real-world environments poses significant threats to the performance of DRL algorithms, potentially leading to erroneous decision-making by agents and severe consequences. To address this issue, this paper proposes a Robust Deep Reinforcement Learning (RDRL) method named S2R-CMI, which aims to mitigate the impact of additive noise in the state space on DRL performance without requiring prior knowledge of the noise disturbance. Specifically, a state-based and reward-based conditional mutual information mechanism is designed to dynamically capture state importance and estimate its contribution to rewards. To address the lack of counterfactual data during training, a counterfactual label estimation method is proposed to approximate the counterfactual reward distribution while avoiding local optima during network training. State importance is then evaluated to quantify the impact of disturbance on states. Finally, we validate the proposed method in five scenarios: Cartpole, LunarLander-Discrete, LunarLander-Continuous, Build-Marine, and Half-Cheetah, under state disturbance. Experimental results demonstrate that S2R-CMI significantly enhances the robustness of DRL algorithms. Furthermore, we conduct experiments in some scenarios without state disturbance, and the results indicate that the method also achieves strong performance, further verifying its superiority and generalization capabilities.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130642"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225013141","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The development of Deep Reinforcement Learning (DRL) overcomes the limitations of traditional reinforcement learning in discrete spaces, extending its applications to scenarios in continuous spaces. However, disturbance commonly encountered in real-world environments poses significant threats to the performance of DRL algorithms, potentially leading to erroneous decision-making by agents and severe consequences. To address this issue, this paper proposes a Robust Deep Reinforcement Learning (RDRL) method named S2R-CMI, which aims to mitigate the impact of additive noise in the state space on DRL performance without requiring prior knowledge of the noise disturbance. Specifically, a state-based and reward-based conditional mutual information mechanism is designed to dynamically capture state importance and estimate its contribution to rewards. To address the lack of counterfactual data during training, a counterfactual label estimation method is proposed to approximate the counterfactual reward distribution while avoiding local optima during network training. State importance is then evaluated to quantify the impact of disturbance on states. Finally, we validate the proposed method in five scenarios: Cartpole, LunarLander-Discrete, LunarLander-Continuous, Build-Marine, and Half-Cheetah, under state disturbance. Experimental results demonstrate that S2R-CMI significantly enhances the robustness of DRL algorithms. Furthermore, we conduct experiments in some scenarios without state disturbance, and the results indicate that the method also achieves strong performance, further verifying its superiority and generalization capabilities.
S2R-CMI:一种加性噪声干扰下基于反事实估计和状态重要性评估的鲁棒深度强化学习方法
深度强化学习(Deep Reinforcement Learning, DRL)的发展克服了传统强化学习在离散空间中的局限性,将其应用扩展到连续空间的场景中。然而,在现实环境中经常遇到的干扰对DRL算法的性能构成了重大威胁,可能导致智能体的错误决策和严重后果。为了解决这一问题,本文提出了一种名为S2R-CMI的鲁棒深度强化学习(RDRL)方法,该方法旨在减轻状态空间中加性噪声对DRL性能的影响,而无需事先了解噪声干扰。具体来说,设计了一种基于状态和基于奖励的条件互信息机制来动态捕捉状态的重要性并估计其对奖励的贡献。针对训练过程中缺乏反事实数据的问题,提出了一种反事实标签估计方法,在逼近反事实奖励分布的同时避免了网络训练过程中的局部最优。然后评估状态的重要性,以量化扰动对状态的影响。最后,在状态干扰下,对Cartpole、LunarLander-Discrete、LunarLander-Continuous、builder - marine和Half-Cheetah五种场景进行了验证。实验结果表明,S2R-CMI显著提高了DRL算法的鲁棒性。此外,我们在一些无状态干扰的场景下进行了实验,结果表明该方法也取得了较强的性能,进一步验证了其优越性和泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信