Robust Policy Learning via Offline Skill Diffusion

ArXiv Pub Date : 2024-03-01 DOI:10.1609/aaai.v38i12.29217

Woo Kyung Kim, Minjong Yoo, Honguk Woo

{"title":"Robust Policy Learning via Offline Skill Diffusion","authors":"Woo Kyung Kim, Minjong Yoo, Honguk Woo","doi":"10.1609/aaai.v38i12.29217","DOIUrl":null,"url":null,"abstract":"Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains.\nThrough experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"14 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v38i12.29217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.

查看原文本刊更多论文

通过离线技能扩散进行稳健的政策学习

基于技能的强化学习（RL）方法已显示出相当大的前景，尤其是在通过分层结构解决长期任务方面。这些技能是从离线数据集中学习的任务识别技能，可以加速新任务的策略学习过程。然而，由于这些技能对数据集的固有依赖性，它们在不同领域的应用仍然受到限制，这给尝试通过 RL 学习不同于数据集领域的目标领域的基于技能的策略带来了挑战。在本文中，我们提出了一种新颖的离线技能学习框架 DuSkill，该框架采用引导扩散模型，从数据集的有限技能中生成扩展的通用技能，从而增强了针对不同领域任务的策略学习的鲁棒性。具体来说，我们设计了一种基于引导扩散的技能解码器，与分层编码相结合，将技能嵌入空间分解为两种不同的表征，一种用于封装领域不变的行为，另一种用于划分导致行为领域变化的因素。我们的 DuSkill 框架增强了离线学习技能的多样性，从而加快了不同领域高级策略的学习过程。通过实验，我们发现 DuSkill 在多个长视距任务中的表现优于其他基于技能的模仿学习和 RL 算法，这证明了它在少量模仿和在线 RL 中的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量