A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhenghongyuan Ni, Ye Jin, Peng Liu, Wei Zhao
{"title":"A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards","authors":"Zhenghongyuan Ni, Ye Jin, Peng Liu, Wei Zhao","doi":"10.1007/s12559-023-10226-4","DOIUrl":null,"url":null,"abstract":"<p>In realistic sparse reward tasks, existing theoretical methods cannot be effectively applied due to the low sampling probability ofrewarded episodes. Profound research on methods based on intrinsic rewards has been conducted to address this issue, but exploration with sparse rewards remains a great challenge. This paper describes the loop enhancement effect in exploration processes with sparse rewards. After each fully trained iteration, the execution probability of ineffective actions is higher than thatof other suboptimal actions, which violates biological habitual behavior principles and is not conducive to effective training. This paper proposes corresponding theorems of relieving the loop enhancement effect in the exploration process with sparse rewards and a heuristic exploration method based on action effectiveness constraints (AEC), which improves policy training efficiency by relieving the loop enhancement effect. Inspired by the fact that animals form habitual behaviors and goal-directed behaviors through the dorsolateral striatum and dorsomedial striatum. The function of the dorsolateral striatum is simulated by an action effectiveness evaluation mechanism (A2EM), which aims to reduce the rate of ineffective samples and improve episode reward expectations. The function of the dorsomedial striatum is simulated by an agent policy network, which aims to achieve task goals. The iterative training of A2EM and the policy forms the AEC model structure. A2EM provides effective samples for the agent policy; the agent policy provides training constraints for A2EM. The experimental results show that A2EM can relieve the loop enhancement effect and has good interpretability and generalizability. AEC enables agents to effectively reduce the loop rate in samples, can collect more effective samples, and improve the efficiency of policy training. The performance of AEC demonstrates the effectiveness of a biological heuristic approach that simulates the function of the dorsal striatum. This approach can be used to improve the robustness of agent exploration with sparse rewards.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"46 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-023-10226-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In realistic sparse reward tasks, existing theoretical methods cannot be effectively applied due to the low sampling probability ofrewarded episodes. Profound research on methods based on intrinsic rewards has been conducted to address this issue, but exploration with sparse rewards remains a great challenge. This paper describes the loop enhancement effect in exploration processes with sparse rewards. After each fully trained iteration, the execution probability of ineffective actions is higher than thatof other suboptimal actions, which violates biological habitual behavior principles and is not conducive to effective training. This paper proposes corresponding theorems of relieving the loop enhancement effect in the exploration process with sparse rewards and a heuristic exploration method based on action effectiveness constraints (AEC), which improves policy training efficiency by relieving the loop enhancement effect. Inspired by the fact that animals form habitual behaviors and goal-directed behaviors through the dorsolateral striatum and dorsomedial striatum. The function of the dorsolateral striatum is simulated by an action effectiveness evaluation mechanism (A2EM), which aims to reduce the rate of ineffective samples and improve episode reward expectations. The function of the dorsomedial striatum is simulated by an agent policy network, which aims to achieve task goals. The iterative training of A2EM and the policy forms the AEC model structure. A2EM provides effective samples for the agent policy; the agent policy provides training constraints for A2EM. The experimental results show that A2EM can relieve the loop enhancement effect and has good interpretability and generalizability. AEC enables agents to effectively reduce the loop rate in samples, can collect more effective samples, and improve the efficiency of policy training. The performance of AEC demonstrates the effectiveness of a biological heuristic approach that simulates the function of the dorsal striatum. This approach can be used to improve the robustness of agent exploration with sparse rewards.

Abstract Image

基于行动效果约束的新型启发式探索方法,用于缓解奖励稀疏的强化学习中的循环增强效应
在现实的稀疏奖励任务中,由于奖励情节的抽样概率较低,现有的理论方法无法有效应用。为了解决这个问题,人们对基于内在奖励的方法进行了深入研究,但在奖励稀少的情况下进行探索仍然是一个巨大的挑战。本文介绍了奖励稀疏的探索过程中的循环增强效应。在每次完全训练迭代后,无效动作的执行概率高于其他次优动作的执行概率,这违反了生物习惯行为原理,不利于有效训练。本文提出了在奖励稀疏的探索过程中缓解循环增强效应的相应定理和基于行动有效性约束(AEC)的启发式探索方法,通过缓解循环增强效应提高了策略训练效率。灵感来源于动物通过背外侧纹状体和背内侧纹状体形成习惯性行为和目标定向行为的事实。背外侧纹状体的功能是通过行动有效性评估机制(A2EM)来模拟的,其目的是降低无效样本的比率并改善情节奖励预期。背内侧纹状体的功能由代理策略网络模拟,旨在实现任务目标。A2EM 和策略的迭代训练构成了 AEC 模型结构。A2EM 为代理策略提供有效样本;代理策略为 A2EM 提供训练约束。实验结果表明,A2EM 可以缓解循环增强效应,并具有良好的可解释性和可推广性。AEC 能使代理有效降低样本的循环率,收集更多有效样本,提高策略训练的效率。AEC 的性能证明了模拟背侧纹状体功能的生物启发式方法的有效性。这种方法可用于提高奖励稀疏的代理探索的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cognitive Computation
Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES
CiteScore
9.30
自引率
3.70%
发文量
116
审稿时长
>12 weeks
期刊介绍: Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信