Probabilistic Multi-knowledge Transfer in Reinforcement Learning

Daniel Fernández, F. Fernández, Javier García
{"title":"Probabilistic Multi-knowledge Transfer in Reinforcement Learning","authors":"Daniel Fernández, F. Fernández, Javier García","doi":"10.1109/ICMLA52953.2021.00079","DOIUrl":null,"url":null,"abstract":"Transfer in Reinforcement Learning (RL) aims to remedy the problem of learning complex RL tasks from scratch, which is impractical in most of the cases due to the huge sample requirements. To overcome this problem, transferring the knowledge acquired from a set of source tasks to a new target task is a core idea. This knowledge can be the policy, the model (state transition and/or reward function), or the value function learned in the source tasks. However, algorithms in transfer learning focus on transferring a single type of knowledge at a time, although intuitively it might be interesting to reuse several types of this knowledge. For this reason, in this paper we propose a multi-knowledge transfer RL algorithm which we call Probabilistic Transfer of Policies and Models (PTPM). PTPM, unlike single-knowledge transfer approaches, combines the transfer of two types of knowledge: policies and models. We show through different experiments on two well-known domains (Grid World and Mountain Car) how this novel multi-knowledge transfer algorithm improves the results of the two methods in which it is inspired separately. As an additional result, we show that sequential learning of multiple tasks is generally better than learning from a library of previously learned tasks from scratch.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"574 1","pages":"471-476"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Transfer in Reinforcement Learning (RL) aims to remedy the problem of learning complex RL tasks from scratch, which is impractical in most of the cases due to the huge sample requirements. To overcome this problem, transferring the knowledge acquired from a set of source tasks to a new target task is a core idea. This knowledge can be the policy, the model (state transition and/or reward function), or the value function learned in the source tasks. However, algorithms in transfer learning focus on transferring a single type of knowledge at a time, although intuitively it might be interesting to reuse several types of this knowledge. For this reason, in this paper we propose a multi-knowledge transfer RL algorithm which we call Probabilistic Transfer of Policies and Models (PTPM). PTPM, unlike single-knowledge transfer approaches, combines the transfer of two types of knowledge: policies and models. We show through different experiments on two well-known domains (Grid World and Mountain Car) how this novel multi-knowledge transfer algorithm improves the results of the two methods in which it is inspired separately. As an additional result, we show that sequential learning of multiple tasks is generally better than learning from a library of previously learned tasks from scratch.
强化学习中的概率多知识转移
强化学习中的迁移(Transfer in Reinforcement Learning, RL)旨在解决从头开始学习复杂强化学习任务的问题,由于样本需求巨大,这在大多数情况下是不切实际的。为了克服这一问题,将从一组源任务中获得的知识转移到新的目标任务中是一个核心思想。这些知识可以是策略、模型(状态转换和/或奖励函数),或者在源任务中学习到的价值函数。然而,迁移学习中的算法专注于一次迁移一种类型的知识,尽管从直觉上讲,重用这种知识的几种类型可能会很有趣。为此,本文提出了一种多知识转移强化学习算法,我们称之为策略和模型的概率转移(PTPM)。PTPM与单一知识转移方法不同,它结合了两种类型的知识转移:政策和模型。我们通过在两个众所周知的领域(Grid World和Mountain Car)上的不同实验,展示了这种新颖的多知识转移算法是如何改进两种方法的结果的。作为一个额外的结果,我们表明对多个任务的顺序学习通常比从以前学习过的任务库中从头开始学习要好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信