Ensemble successor representations for task generalization in offline-to-online reinforcement learning

IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang
{"title":"Ensemble successor representations for task generalization in offline-to-online reinforcement learning","authors":"Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang","doi":"10.1007/s11432-023-4028-1","DOIUrl":null,"url":null,"abstract":"<p>In reinforcement learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble <i>Q</i> functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of <i>Q</i> functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"75 1","pages":""},"PeriodicalIF":7.3000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-4028-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In reinforcement learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.

离线到在线强化学习中任务泛化的集合后继表征
在强化学习(RL)中,由于探索困难,利用在线经验从头开始训练策略的效率很低。最近,离线强化学习提供了一种很有前景的解决方案,即给出一个初始化的离线策略,然后通过在线交互对其进行完善。然而,现有的方法主要是在同一任务中执行离线和在线学习,而没有考虑离线到在线适应过程中的任务泛化问题。在现实世界的应用中,我们通常只有一个特定任务的离线数据集,而目标是对多个任务进行快速在线适应。为了解决这个问题,我们的研究以在线 RL 中任务泛化的后继表征研究为基础,并扩展了框架,将离线到在线学习纳入其中。我们证明,使用后继特征的传统范式无法有效利用离线数据,也无法通过在线微调提高新任务的性能。为了缓解这一问题,我们引入了一种新方法,利用离线数据获取后继表征集合,然后构建集合 Q 函数。这种方法能从不同覆盖率的数据集中实现稳健的表征学习,并在在线微调阶段促进 Q 函数对新任务的快速适应。广泛的实证评估提供了令人信服的证据,展示了我们的方法在泛化到不同甚至未见过的任务方面的卓越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Science China Information Sciences
Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
12.60
自引率
5.70%
发文量
224
审稿时长
8.3 months
期刊介绍: Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信