Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion

2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT) Pub Date : 2021-12-01 DOI:10.1109/CEECT53198.2021.9672626

Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan

{"title":"Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion","authors":"Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan","doi":"10.1109/CEECT53198.2021.9672626","DOIUrl":null,"url":null,"abstract":"Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.","PeriodicalId":153030,"journal":{"name":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEECT53198.2021.9672626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.

查看原文本刊更多论文

基于自适应集成值扩展的模型辅助强化学习

与基于模型的方法相结合，强化学习可以在低样本复杂度的情况下实现高性能。然而，不准确的学习动力学模型会降低性能，并且累积偏差随着虚拟铺展的长度而增加。一个关键的挑战是在不引入显著误差的情况下提高采样效率。本文提出了一种模型辅助自适应集成价值扩展(MAEVE)方法，该方法通过虚训练来增强价值扩展。MAEVE通过基于随机集成方法显式估计动力学和值函数的不确定性，自适应调整滚动的长度，以保持样本复杂度和计算复杂度之间的动态平衡。考虑到累积模型偏差对不同铺展长度的影响，MAEVE调整了不同想象深度下样本的抽样概率，而不是对它们进行平等处理。因此，MAEVE保证了学习到的动力学模型只有在不引入严重误差的情况下才会被使用。总的来说，与无模型和基于模型的基线相比，我们的方法在具有挑战性的连续控制基准上显着提高了样本效率，而不会降低性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)

自引率

0.00%

发文量