Xiaoyu Tan;Chao Qu;Junwu Xiong;James Zhang;Xihe Qiu;Yaochu Jin
{"title":"Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding","authors":"Xiaoyu Tan;Chao Qu;Junwu Xiong;James Zhang;Xihe Qiu;Yaochu Jin","doi":"10.1109/TETCI.2024.3369636","DOIUrl":null,"url":null,"abstract":"Model-based reinforcement learning (MBRL) has shown its advantages in sample efficiency over model-free reinforcement learning (MFRL) by leveraging control-based domain knowledge. Despite the impressive results it achieves, MBRL is still outperformed by MFRL due to the lack of unlimited interactions with the environment. While imaginary data can be generated by imagining the trajectories of future states, a trade-off between the usage of data generation and the influence of model bias remains to be resolved. In this paper, we propose a simple and elegant off-policy model-based deep reinforcement learning algorithm with a model embedded in the framework of probabilistic reinforcement learning, called MEMB. To balance the sample-efficiency and model bias, we exploit both real and imaginary data in training. In particular, we embed the model in the policy update and learn value functions from the real data set. We also provide a theoretical analysis of MEMB with the Lipschitz continuity assumption on the model and policy, proving the reliability of the short-term imaginary rollout. Finally, we evaluate MEMB on several benchmarks and demonstrate that our algorithm can achieve state-of-the-art performance.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 4","pages":"2974-2986"},"PeriodicalIF":5.3000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10463525/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Model-based reinforcement learning (MBRL) has shown its advantages in sample efficiency over model-free reinforcement learning (MFRL) by leveraging control-based domain knowledge. Despite the impressive results it achieves, MBRL is still outperformed by MFRL due to the lack of unlimited interactions with the environment. While imaginary data can be generated by imagining the trajectories of future states, a trade-off between the usage of data generation and the influence of model bias remains to be resolved. In this paper, we propose a simple and elegant off-policy model-based deep reinforcement learning algorithm with a model embedded in the framework of probabilistic reinforcement learning, called MEMB. To balance the sample-efficiency and model bias, we exploit both real and imaginary data in training. In particular, we embed the model in the policy update and learn value functions from the real data set. We also provide a theoretical analysis of MEMB with the Lipschitz continuity assumption on the model and policy, proving the reliability of the short-term imaginary rollout. Finally, we evaluate MEMB on several benchmarks and demonstrate that our algorithm can achieve state-of-the-art performance.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.