{"title":"强化学习中的概率多知识转移","authors":"Daniel Fernández, F. Fernández, Javier García","doi":"10.1109/ICMLA52953.2021.00079","DOIUrl":null,"url":null,"abstract":"Transfer in Reinforcement Learning (RL) aims to remedy the problem of learning complex RL tasks from scratch, which is impractical in most of the cases due to the huge sample requirements. To overcome this problem, transferring the knowledge acquired from a set of source tasks to a new target task is a core idea. This knowledge can be the policy, the model (state transition and/or reward function), or the value function learned in the source tasks. However, algorithms in transfer learning focus on transferring a single type of knowledge at a time, although intuitively it might be interesting to reuse several types of this knowledge. For this reason, in this paper we propose a multi-knowledge transfer RL algorithm which we call Probabilistic Transfer of Policies and Models (PTPM). PTPM, unlike single-knowledge transfer approaches, combines the transfer of two types of knowledge: policies and models. We show through different experiments on two well-known domains (Grid World and Mountain Car) how this novel multi-knowledge transfer algorithm improves the results of the two methods in which it is inspired separately. As an additional result, we show that sequential learning of multiple tasks is generally better than learning from a library of previously learned tasks from scratch.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"574 1","pages":"471-476"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Probabilistic Multi-knowledge Transfer in Reinforcement Learning\",\"authors\":\"Daniel Fernández, F. Fernández, Javier García\",\"doi\":\"10.1109/ICMLA52953.2021.00079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transfer in Reinforcement Learning (RL) aims to remedy the problem of learning complex RL tasks from scratch, which is impractical in most of the cases due to the huge sample requirements. To overcome this problem, transferring the knowledge acquired from a set of source tasks to a new target task is a core idea. This knowledge can be the policy, the model (state transition and/or reward function), or the value function learned in the source tasks. However, algorithms in transfer learning focus on transferring a single type of knowledge at a time, although intuitively it might be interesting to reuse several types of this knowledge. For this reason, in this paper we propose a multi-knowledge transfer RL algorithm which we call Probabilistic Transfer of Policies and Models (PTPM). PTPM, unlike single-knowledge transfer approaches, combines the transfer of two types of knowledge: policies and models. We show through different experiments on two well-known domains (Grid World and Mountain Car) how this novel multi-knowledge transfer algorithm improves the results of the two methods in which it is inspired separately. As an additional result, we show that sequential learning of multiple tasks is generally better than learning from a library of previously learned tasks from scratch.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"574 1\",\"pages\":\"471-476\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00079\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
强化学习中的迁移(Transfer in Reinforcement Learning, RL)旨在解决从头开始学习复杂强化学习任务的问题,由于样本需求巨大,这在大多数情况下是不切实际的。为了克服这一问题,将从一组源任务中获得的知识转移到新的目标任务中是一个核心思想。这些知识可以是策略、模型(状态转换和/或奖励函数),或者在源任务中学习到的价值函数。然而,迁移学习中的算法专注于一次迁移一种类型的知识,尽管从直觉上讲,重用这种知识的几种类型可能会很有趣。为此,本文提出了一种多知识转移强化学习算法,我们称之为策略和模型的概率转移(PTPM)。PTPM与单一知识转移方法不同,它结合了两种类型的知识转移:政策和模型。我们通过在两个众所周知的领域(Grid World和Mountain Car)上的不同实验,展示了这种新颖的多知识转移算法是如何改进两种方法的结果的。作为一个额外的结果,我们表明对多个任务的顺序学习通常比从以前学习过的任务库中从头开始学习要好。
Probabilistic Multi-knowledge Transfer in Reinforcement Learning
Transfer in Reinforcement Learning (RL) aims to remedy the problem of learning complex RL tasks from scratch, which is impractical in most of the cases due to the huge sample requirements. To overcome this problem, transferring the knowledge acquired from a set of source tasks to a new target task is a core idea. This knowledge can be the policy, the model (state transition and/or reward function), or the value function learned in the source tasks. However, algorithms in transfer learning focus on transferring a single type of knowledge at a time, although intuitively it might be interesting to reuse several types of this knowledge. For this reason, in this paper we propose a multi-knowledge transfer RL algorithm which we call Probabilistic Transfer of Policies and Models (PTPM). PTPM, unlike single-knowledge transfer approaches, combines the transfer of two types of knowledge: policies and models. We show through different experiments on two well-known domains (Grid World and Mountain Car) how this novel multi-knowledge transfer algorithm improves the results of the two methods in which it is inspired separately. As an additional result, we show that sequential learning of multiple tasks is generally better than learning from a library of previously learned tasks from scratch.