Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan
{"title":"基于自适应集成值扩展的模型辅助强化学习","authors":"Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan","doi":"10.1109/CEECT53198.2021.9672626","DOIUrl":null,"url":null,"abstract":"Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.","PeriodicalId":153030,"journal":{"name":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion\",\"authors\":\"Yunkun Xu, Zhen-yu Liu, Guifang Duan, Jianrong Tan\",\"doi\":\"10.1109/CEECT53198.2021.9672626\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.\",\"PeriodicalId\":153030,\"journal\":{\"name\":\"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)\",\"volume\":\"123 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEECT53198.2021.9672626\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Electrical Engineering and Control Technologies (CEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEECT53198.2021.9672626","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion
Integrated with model-based approaches, reinforcement learning can achieve high performance with low sample complexity. However, the inaccurate learned dynamics model will degrade the performance, and the cumulative bias increases with the length of imaginary rollout. A key challenge is to improve sample efficiency without introducing significant errors. In this paper, Model-assisted Adaptive Ensemble Value Expansion (MAEVE) is proposed, which augments value expansion with imaginary training. By explicitly estimating the uncertainty of the dynamics and the value fucntion based on stochastic ensemble method, MAEVE adjusts the length of rollouts adaptively to maintain a dynamic balance between sample complexity and computational complexity. Considering the impact of the cumulative model bias on different rollout-length, MAEVE adjusts the sampling probabilities of samples at different imagination-depths instead of treating them equally. Therefore, MAEVE ensures that the learned dynamics model is only utilized if it does not introduce serious errors. Altogether, our approach significantly increases the sample efficiency compared to model-free and model-based baselines on challenging continuous control benchmarks without performance degradation.