Online Prototype Alignment for Few-shot Policy Transfer

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-06-12 DOI:10.48550/arXiv.2306.07307

Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan, Rui Chen, Siming Lan, Xingui Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen

{"title":"Online Prototype Alignment for Few-shot Policy Transfer","authors":"Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan, Rui Chen, Siming Lan, Xingui Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen","doi":"10.48550/arXiv.2306.07307","DOIUrl":null,"url":null,"abstract":"Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the target domain in an efficient and purposeful manner, and then connect them with the seen elements in the source domain according to their functionalities (instead of visual clues). Experimental results show that when the target domain looks visually different from the source domain, OPA can achieve better transfer performance even with much fewer samples from the target domain, outperforming prior methods.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"467 1","pages":"39968-39983"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.07307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the target domain in an efficient and purposeful manner, and then connect them with the seen elements in the source domain according to their functionalities (instead of visual clues). Experimental results show that when the target domain looks visually different from the source domain, OPA can achieve better transfer performance even with much fewer samples from the target domain, outperforming prior methods.

查看原文本刊更多论文

在线原型对准少射策略转移

强化学习中的领域适应主要是处理将策略转移到新环境时观察值的变化。许多传统的强化学习领域自适应方法都是通过显式或隐式的方式学习源域和目标域之间的映射函数。然而，它们通常需要访问来自目标领域的大量数据。此外，它们往往依赖于视觉线索来学习映射函数，当源域与目标域看起来差异很大时，它们可能会失败。为了解决这些问题，我们提出了一种基于元素功能相似度的在线原型对齐(Online Prototype Alignment, OPA)框架来学习映射函数，并能够在几集内实现少镜头策略转移。OPA的关键见解是引入一种探索机制，该机制可以以有效和有目的的方式与目标领域的不可见元素交互，然后根据它们的功能(而不是视觉线索)将它们与源领域的可见元素连接起来。实验结果表明，当目标域与源域在视觉上不同时，即使目标域的样本更少，OPA也能获得更好的传输性能，优于先前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量