Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning

2022 IEEE International Conference on Real-time Computing and Robotics (RCAR) Pub Date : 2022-07-17 DOI:10.1109/RCAR54675.2022.9872291

Ziyi Liu, Zongyuan Li, Qianqian Cao, Yuan Wan, Xian Guo

{"title":"Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning","authors":"Ziyi Liu, Zongyuan Li, Qianqian Cao, Yuan Wan, Xian Guo","doi":"10.1109/RCAR54675.2022.9872291","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning algorithms can enable agents to learn policies for complex tasks without expert knowledge. However, the learned policies are typically specialized to one specific task and can not generalize to new tasks. While meta-reinforcement learning (meta-RL) algorithms can enable agents to solve new tasks based on prior experience, most of them build on on-policy reinforcement learning algorithms which require large amounts of samples during meta-training and do not consider task-specific features across different tasks and thus make it very difficult to train an agent with high performance. To address these challenges, in this paper, we propose an off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration. Our approach outperforms baseline methods both in sample efficiency and asymptotic performance on several meta-RL benchmarks.","PeriodicalId":304963,"journal":{"name":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR54675.2022.9872291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Deep reinforcement learning algorithms can enable agents to learn policies for complex tasks without expert knowledge. However, the learned policies are typically specialized to one specific task and can not generalize to new tasks. While meta-reinforcement learning (meta-RL) algorithms can enable agents to solve new tasks based on prior experience, most of them build on on-policy reinforcement learning algorithms which require large amounts of samples during meta-training and do not consider task-specific features across different tasks and thus make it very difficult to train an agent with high performance. To address these challenges, in this paper, we propose an off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration. Our approach outperforms baseline methods both in sample efficiency and asymptotic performance on several meta-RL benchmarks.

查看原文本刊更多论文

在有效的非策略元强化学习中庆祝鲁棒性

深度强化学习算法可以使代理在没有专家知识的情况下学习复杂任务的策略。然而，学习到的策略通常是专门针对一个特定的任务，不能推广到新的任务。虽然元强化学习(meta-RL)算法可以使智能体根据先前的经验解决新任务，但它们大多建立在非策略强化学习算法的基础上，这些算法在元训练期间需要大量的样本，并且不考虑不同任务之间的特定任务特征，因此很难训练出高性能的智能体。为了解决这些挑战，在本文中，我们提出了一种off-policy - rl算法，缩写为CRL(庆祝鲁棒性学习)，该算法通过适配器网络将特定于任务的策略参数分解为共享的低级参数，学习概率潜在空间以提取跨不同任务的通用信息并执行时间扩展探索。在几个元rl基准测试中，我们的方法在样本效率和渐近性能方面都优于基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)

自引率

0.00%

发文量