{"title":"带引导模型的在线自适应离线强化学习。","authors":"Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang","doi":"10.1109/tnnls.2025.3589418","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"34 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Adaptable Offline RL With Guidance Model.\",\"authors\":\"Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang\",\"doi\":\"10.1109/tnnls.2025.3589418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":10.2000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3589418\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3589418","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.