PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity

IF 4.9 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ziming He;Chao Song;Jingchen Li;Haobin Shi
{"title":"PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity","authors":"Ziming He;Chao Song;Jingchen Li;Haobin Shi","doi":"10.1109/TCDS.2024.3471645","DOIUrl":null,"url":null,"abstract":"We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"495-509"},"PeriodicalIF":4.9000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10704571/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.
PDRL:渐进式多样性在无监督技能发现中的深层状态和进一步行为
我们提出了渐进式多样性强化学习(PDRL),一种用于发现多样化技能的无监督强化学习(URL)方法。PDRL鼓励跨越多个步骤的学习行为,特别是通过引入“更深层次的状态”——需要更长的动作序列才能达到的状态,而不需要重复。为了解决在部分可观察环境中弱技能多样性和弱探索的挑战,PDRL采用了两种技能学习指示来促进探索和技能多样性,强调与前一个相比,每个观察和子轨迹的准确性。技能潜变量由状态或轨迹映射表示,有助于区分和恢复所学技能。这种双重表示促进了探索和技能多样性,而无需额外的建模或先验知识。PDRL还通过观察和子轨迹的结合整合了内在奖励,有效地防止了技能重复。跨多个基准测试的实验表明,与现有方法相比,PDRL发现了更广泛的技能范围。此外,PDRL预训练加速了目标条件强化学习(GCRL)任务的微调,如Fetch机器人操作任务所示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
10.00%
发文量
170
期刊介绍: The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信