{"title":"基于Dropout的异步多任务连续控制强化学习","authors":"Z. Jiao, J. Oh","doi":"10.1109/ICMLA.2019.00099","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning is sample inefficient for solving complex tasks. Recently, multitask reinforcement learning has received increased attention because of its ability to learn general policies with improved sample efficiency. In multitask reinforcement learning, a single agent must learn multiple related tasks, either sequentially or simultaneously. Based on the DDPG algorithm, this paper presents Asyn-DDPG, which asynchronously learns a multitask policy for continuous control with simultaneous worker agents. We empirically found that sparse policy gradients can significantly reduce interference among conflicting tasks and make multitask learning more stable and sample efficient. To ensure the sparsity of gradients evaluated for each task, Asyn-DDPG represents both actor and critic functions as deep neural networks and regularizes them using Dropout. During training, worker agents share the actor and the critic functions, and asynchronously optimize them using task-specific gradients. For evaluating Asyn-DDPG, we proposed robotic navigation tasks based on realistically simulated robots and physics-enabled maze-like environments. Although the number of tasks used in our experiment is small, each task is conducted based on a real-world setting and posts a challenging environment. Through extensive evaluation, we demonstrate that Dropout regularization can effectively stabilize asynchronous learning and enable Asyn-DDPG to outperform DDPG significantly. Also, Asyn-DDPG was able to learn a multitask policy that can be well generalized for handling environments unseen during training.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Asynchronous Multitask Reinforcement Learning with Dropout for Continuous Control\",\"authors\":\"Z. Jiao, J. Oh\",\"doi\":\"10.1109/ICMLA.2019.00099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning is sample inefficient for solving complex tasks. Recently, multitask reinforcement learning has received increased attention because of its ability to learn general policies with improved sample efficiency. In multitask reinforcement learning, a single agent must learn multiple related tasks, either sequentially or simultaneously. Based on the DDPG algorithm, this paper presents Asyn-DDPG, which asynchronously learns a multitask policy for continuous control with simultaneous worker agents. We empirically found that sparse policy gradients can significantly reduce interference among conflicting tasks and make multitask learning more stable and sample efficient. To ensure the sparsity of gradients evaluated for each task, Asyn-DDPG represents both actor and critic functions as deep neural networks and regularizes them using Dropout. During training, worker agents share the actor and the critic functions, and asynchronously optimize them using task-specific gradients. For evaluating Asyn-DDPG, we proposed robotic navigation tasks based on realistically simulated robots and physics-enabled maze-like environments. Although the number of tasks used in our experiment is small, each task is conducted based on a real-world setting and posts a challenging environment. Through extensive evaluation, we demonstrate that Dropout regularization can effectively stabilize asynchronous learning and enable Asyn-DDPG to outperform DDPG significantly. Also, Asyn-DDPG was able to learn a multitask policy that can be well generalized for handling environments unseen during training.\",\"PeriodicalId\":436714,\"journal\":{\"name\":\"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2019.00099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Asynchronous Multitask Reinforcement Learning with Dropout for Continuous Control
Deep reinforcement learning is sample inefficient for solving complex tasks. Recently, multitask reinforcement learning has received increased attention because of its ability to learn general policies with improved sample efficiency. In multitask reinforcement learning, a single agent must learn multiple related tasks, either sequentially or simultaneously. Based on the DDPG algorithm, this paper presents Asyn-DDPG, which asynchronously learns a multitask policy for continuous control with simultaneous worker agents. We empirically found that sparse policy gradients can significantly reduce interference among conflicting tasks and make multitask learning more stable and sample efficient. To ensure the sparsity of gradients evaluated for each task, Asyn-DDPG represents both actor and critic functions as deep neural networks and regularizes them using Dropout. During training, worker agents share the actor and the critic functions, and asynchronously optimize them using task-specific gradients. For evaluating Asyn-DDPG, we proposed robotic navigation tasks based on realistically simulated robots and physics-enabled maze-like environments. Although the number of tasks used in our experiment is small, each task is conducted based on a real-world setting and posts a challenging environment. Through extensive evaluation, we demonstrate that Dropout regularization can effectively stabilize asynchronous learning and enable Asyn-DDPG to outperform DDPG significantly. Also, Asyn-DDPG was able to learn a multitask policy that can be well generalized for handling environments unseen during training.