带引导模型的在线自适应离线强化学习。

IF 10.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-07-24 DOI:10.1109/tnnls.2025.3589418

Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang

{"title":"带引导模型的在线自适应离线强化学习。","authors":"Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang","doi":"10.1109/tnnls.2025.3589418","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"34 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Adaptable Offline RL With Guidance Model.\",\"authors\":\"Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang\",\"doi\":\"10.1109/tnnls.2025.3589418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":10.2000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3589418\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3589418","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

强化学习（RL）已经成为跨各种应用程序的一种很有前途的方法，然而，它依赖于反复的试错学习来从头开始开发有效的策略，这对在交互成本高或受限的场景中部署策略提出了重大挑战。在这项工作中，我们研究了离线到在线RL范式，其中策略最初使用离线历史数据集进行预训练，随后使用有限数量的在线交互进行微调。先前的研究表明，高效的离线预训练对于获得最佳的最终表现至关重要。然而，在保持在线微调的适应性的同时，纳入适当的保守性以防止对分布外（OOD）数据的高估是具有挑战性的。为了解决这些问题，我们提出了一种有效的离线强化学习算法，该算法集成了一个引导模型，以引入适当的保守性，并确保对在线微调的无缝适应。我们严格的理论分析和广泛的实验评估证明了我们的新算法具有更好的性能，强调了引导模型在提高其有效性方面的关键作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Online Adaptable Offline RL With Guidance Model.

Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.