Online Adaptable Offline RL With Guidance Model.

IF 10.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang
{"title":"Online Adaptable Offline RL With Guidance Model.","authors":"Xun Wang,Jingmian Wang,Zhuzhong Qian,Bolei Zhang","doi":"10.1109/tnnls.2025.3589418","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"34 1","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3589418","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning (RL) has emerged as a promising approach across various applications, yet its reliance on repeated trial-and-error learning to develop effective policies from scratch poses significant challenges for deployment in scenarios where interaction is costly or constrained. In this work, we investigate the offline-to-online RL paradigm, wherein policies are initially pretrained using offline historical datasets and subsequently fine-tuned with a limited amount of online interaction. Previous research has suggested that efficient offline pretraining is crucial for achieving optimal final performance. However, it is challenging to incorporate appropriate conservatism to prevent the overestimation of out-of-distribution (OOD) data while maintaining adaptability for online fine-tuning. To address these issues, we propose an effective offline RL algorithm that integrates a guidance model to introduce suitable conservatism and ensure seamless adaptability to online fine-tuning. Our rigorous theoretical analysis and extensive experimental evaluations demonstrate better performance of our novel algorithm, underscoring the critical role played by the guidance model in enhancing its efficacy.
带引导模型的在线自适应离线强化学习。
强化学习(RL)已经成为跨各种应用程序的一种很有前途的方法,然而,它依赖于反复的试错学习来从头开始开发有效的策略,这对在交互成本高或受限的场景中部署策略提出了重大挑战。在这项工作中,我们研究了离线到在线RL范式,其中策略最初使用离线历史数据集进行预训练,随后使用有限数量的在线交互进行微调。先前的研究表明,高效的离线预训练对于获得最佳的最终表现至关重要。然而,在保持在线微调的适应性的同时,纳入适当的保守性以防止对分布外(OOD)数据的高估是具有挑战性的。为了解决这些问题,我们提出了一种有效的离线强化学习算法,该算法集成了一个引导模型,以引入适当的保守性,并确保对在线微调的无缝适应。我们严格的理论分析和广泛的实验评估证明了我们的新算法具有更好的性能,强调了引导模型在提高其有效性方面的关键作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信