Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan
{"title":"Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion","authors":"Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan","doi":"10.1109/CEECT53198.2021.9672626","DOIUrl":null,"url":null,"abstract":"Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.","PeriodicalId":153030,"journal":{"name":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEECT53198.2021.9672626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.