Xin Du, Shan Zhong, Shengrong Gong, Yali Si, Zhenyu Qi
{"title":"Enhancing model learning in reinforcement learning through Q-function-guided trajectory alignment","authors":"Xin Du, Shan Zhong, Shengrong Gong, Yali Si, Zhenyu Qi","doi":"10.1007/s10489-024-06083-9","DOIUrl":null,"url":null,"abstract":"<div><p>Model-based reinforcement learning (MBRL) methods hold great promise for achieving excellent sample efficiency by fitting a dynamics model to previously observed data and leveraging it for RL or planning. However, the resulting trajectories may diverge from actual-world trajectories due to the accumulation of errors in multi-step model sampling, particularly for longer horizons. This undermines the performance of MBRL and significantly affects sample efficiency. Therefore, we present a trajectory alignment capable of aligning simulated trajectories with their real counterparts from any initial random state and with adaptive length, enabling the preparation of paired real-simulated samples to minimize compounding errors. Additionally, we design a Q-function function to estimate Q values for the paired real-simulated samples. The simulated samples whose Q-value difference from the real ones surpasses a given threshold will be discarded, thus preventing the model from over-fitting to erroneous samples. Experimental results demonstrate that both trajectory alignment and Q-function guided sample filtration contribute to improving policy and sample efficiency. Our method surpasses previous state-of-the-art model-based approaches in both sample efficiency and asymptotic performance across a series of challenging control tasks. The code is open source and available at https://github.com/duxin0618/qgtambpo.git.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 10","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-06083-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Model-based reinforcement learning (MBRL) methods hold great promise for achieving excellent sample efficiency by fitting a dynamics model to previously observed data and leveraging it for RL or planning. However, the resulting trajectories may diverge from actual-world trajectories due to the accumulation of errors in multi-step model sampling, particularly for longer horizons. This undermines the performance of MBRL and significantly affects sample efficiency. Therefore, we present a trajectory alignment capable of aligning simulated trajectories with their real counterparts from any initial random state and with adaptive length, enabling the preparation of paired real-simulated samples to minimize compounding errors. Additionally, we design a Q-function function to estimate Q values for the paired real-simulated samples. The simulated samples whose Q-value difference from the real ones surpasses a given threshold will be discarded, thus preventing the model from over-fitting to erroneous samples. Experimental results demonstrate that both trajectory alignment and Q-function guided sample filtration contribute to improving policy and sample efficiency. Our method surpasses previous state-of-the-art model-based approaches in both sample efficiency and asymptotic performance across a series of challenging control tasks. The code is open source and available at https://github.com/duxin0618/qgtambpo.git.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.