强化学习中有效探索的自适应探索网络策略

Min Li, William Zhu
{"title":"强化学习中有效探索的自适应探索网络策略","authors":"Min Li, William Zhu","doi":"10.1117/12.2667206","DOIUrl":null,"url":null,"abstract":"How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive exploration network policy for effective exploration in reinforcement learning\",\"authors\":\"Min Li, William Zhu\",\"doi\":\"10.1117/12.2667206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.\",\"PeriodicalId\":137914,\"journal\":{\"name\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

如何实现有效的探索是强化学习训练中的一个关键问题。已知的探索策略通过在策略中添加噪声来引导智能体探索来解决这个问题。然而,该方法存在两个问题:1)由于固定分布的附加噪声,探索规模对训练稳定性的适应性较低;2)由于探索不足,训练后学习到的策略可能是局部最优的。自适应勘探策略通过根据训练稳定性调整噪声尺度来解决第一个问题。但学到的策略可能仍然是局部最优的。本文提出了一种考虑勘探方向的自适应勘探网络策略来解决这一问题。其动机是agent应该朝着增加样本多样性的方向进行探索,以避免由于探索不足而导致的局部最优。首先,我们构建一个预测网络来预测agent在当前状态下做出决策后的下一个状态。其次,提出了一个勘探网络来生成勘探方向。为了增加样本多样性,通过最大化预测网络预测的下一个状态与当前状态之间的距离来训练该网络。然后调整勘探规模以适应训练的稳定性。最后,基于生成的勘探方向和自适应勘探规模构造的新噪声,提出了自适应勘探网络策略。实验证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive exploration network policy for effective exploration in reinforcement learning
How to achieve effective exploration is a key issue in the training of Reinforcement learning. The known exploration policy addresses this issue by adding noise to the policy for guiding the agent exploring. However, it has two problems that 1) the exploration scale has low adaptability to the training stability due to the added noise from a fixed distribution and 2) the policy learned after the training may be locally optimal because the exploration is insufficient. Adaptive exploration policy addresses the first problem by adjusting the noise scale according to the training stability. But the learned policy may still be locally optimal. In this paper, we propose an adaptive exploration network policy to address this problem by considering exploration direction. The motivation is that the agent should explore in the direction of increasing the sample diversity to avoid the local optimum caused by insufficient exploration. Firstly, we construct a prediction network to predict the next state after the agent makes a decision at the current state. Secondly, we propose an exploration network to generate the exploration direction. To increase the sample diversity, this network is trained by maximizing the distance between the predicted next state from prediction network and the current state. Then we adjust the exploration scale to adapt to the training stability. Finally, we propose adaptive exploration network policy based on the new noise constructed by the generated exploration direction and the adaptive exploration scale. Experiments illustrate the effectiveness of our method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信