{"title":"持续深度强化学习的课程目标掩蔽","authors":"Manfred Eppe, S. Magg, S. Wermter","doi":"10.1109/DEVLRN.2019.8850721","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).","PeriodicalId":318973,"journal":{"name":"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Curriculum goal masking for continuous deep reinforcement learning\",\"authors\":\"Manfred Eppe, S. Magg, S. Wermter\",\"doi\":\"10.1109/DEVLRN.2019.8850721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).\",\"PeriodicalId\":318973,\"journal\":{\"name\":\"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2019.8850721\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2019.8850721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Curriculum goal masking for continuous deep reinforcement learning
Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal's difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).