持续深度强化学习的课程目标掩蔽

2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) Pub Date : 2018-09-17 DOI:10.1109/DEVLRN.2019.8850721

Manfred Eppe, S. Magg, S. Wermter

{"title":"持续深度强化学习的课程目标掩蔽","authors":"Manfred Eppe, S. Magg, S. Wermter","doi":"10.1109/DEVLRN.2019.8850721","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).","PeriodicalId":318973,"journal":{"name":"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Curriculum goal masking for continuous deep reinforcement learning\",\"authors\":\"Manfred Eppe, S. Magg, S. Wermter\",\"doi\":\"10.1109/DEVLRN.2019.8850721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).\",\"PeriodicalId\":318973,\"journal\":{\"name\":\"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2019.8850721\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2019.8850721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

深度强化学习最近引起了人们对策略或价值函数基于通用值函数近似器(uvfa)的问题的关注，这使得它们独立于目标。有证据表明，目标抽样对学习成绩有很强的影响，而优化目标抽样的问题往往是用内在动机方法来解决的。然而，在基于uvfa的深度强化学习背景下，缺乏关注目标采样的通用机制。在这项工作中，我们引入目标掩蔽作为一种估计目标难度的方法，并利用这种估计来实现课程学习。我们的研究结果表明，专注于中等难度水平的目标适用于深度确定性策略梯度(DDPG)方法，而“瞄准星星，到达月亮策略”，其中难度目标的采样频率远高于简单目标，在DDPG与后见之明经验回放(HER)相结合的情况下，可以获得最佳的学习性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Curriculum goal masking for continuous deep reinforcement learning

Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)

自引率

0.00%

发文量