Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion

Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan
{"title":"Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion","authors":"Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan","doi":"10.1109/CEECT53198.2021.9672626","DOIUrl":null,"url":null,"abstract":"Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.","PeriodicalId":153030,"journal":{"name":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEECT53198.2021.9672626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.
基于自适应集成值扩展的模型辅助强化学习
与基于模型的方法相结合,强化学习可以在低样本复杂度的情况下实现高性能。然而,不准确的学习动力学模型会降低性能,并且累积偏差随着虚拟铺展的长度而增加。一个关键的挑战是在不引入显著误差的情况下提高采样效率。本文提出了一种模型辅助自适应集成价值扩展(MAEVE)方法,该方法通过虚训练来增强价值扩展。MAEVE通过基于随机集成方法显式估计动力学和值函数的不确定性,自适应调整滚动的长度,以保持样本复杂度和计算复杂度之间的动态平衡。考虑到累积模型偏差对不同铺展长度的影响,MAEVE调整了不同想象深度下样本的抽样概率,而不是对它们进行平等处理。因此,MAEVE保证了学习到的动力学模型只有在不引入严重误差的情况下才会被使用。总的来说,与无模型和基于模型的基线相比,我们的方法在具有挑战性的连续控制基准上显着提高了样本效率,而不会降低性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信