Incremental model-based reinforcement learning with model constraint

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-02-08 DOI:10.1016/j.neunet.2025.107245

Zhiyou Yang , Mingsheng Fu , Hong Qu , Fan Li , Shuqing Shi , Wang Hu

{"title":"Incremental model-based reinforcement learning with model constraint","authors":"Zhiyou Yang , Mingsheng Fu , Hong Qu , Fan Li , Shuqing Shi , Wang Hu","doi":"10.1016/j.neunet.2025.107245","DOIUrl":null,"url":null,"abstract":"<div><div>In model-based reinforcement learning (RL) approaches, the estimated model of a real environment is learned with limited data and then utilized for policy optimization. As a result, the policy optimization process in model-based RL is influenced by both policy and estimated model updates. In practice, previous model-based RL methods only perform incremental policy constraint to policy updates, which cannot assure the complete incremental updates, thereby limiting the algorithm’s performance. To address this issue, we propose an incremental model-based RL update scheme by analyzing the policy optimization procedure of model-based RL. This scheme includes both an incremental model constraint that guarantees incremental updates to the estimated model, and an incremental policy constraint that ensures incremental updates to the policy. Further, we establish a performance bound incorporating the incremental model-based RL update scheme between the real environment and the estimated model, which can assure non-decreasing policy performance improvement in the real environment. To implement the incremental model-based RL update scheme, we develop a simple and efficient model-based RL algorithm known as <strong>IMPO</strong> (<strong>I</strong>ncremental <strong>M</strong>odel-based <strong>P</strong>olicy <strong>O</strong>ptimization), which leverages previous knowledge to enhance stability during the learning process. Experimental results across various control benchmarks demonstrate that IMPO significantly outperforms previous state-of-the-art model-based RL methods in terms of overall performance and sample efficiency.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107245"},"PeriodicalIF":6.0000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001248","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In model-based reinforcement learning (RL) approaches, the estimated model of a real environment is learned with limited data and then utilized for policy optimization. As a result, the policy optimization process in model-based RL is influenced by both policy and estimated model updates. In practice, previous model-based RL methods only perform incremental policy constraint to policy updates, which cannot assure the complete incremental updates, thereby limiting the algorithm’s performance. To address this issue, we propose an incremental model-based RL update scheme by analyzing the policy optimization procedure of model-based RL. This scheme includes both an incremental model constraint that guarantees incremental updates to the estimated model, and an incremental policy constraint that ensures incremental updates to the policy. Further, we establish a performance bound incorporating the incremental model-based RL update scheme between the real environment and the estimated model, which can assure non-decreasing policy performance improvement in the real environment. To implement the incremental model-based RL update scheme, we develop a simple and efficient model-based RL algorithm known as IMPO (Incremental Model-based Policy Optimization), which leverages previous knowledge to enhance stability during the learning process. Experimental results across various control benchmarks demonstrate that IMPO significantly outperforms previous state-of-the-art model-based RL methods in terms of overall performance and sample efficiency.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.