{"title":"PDRL:渐进式多样性在无监督技能发现中的深层状态和进一步行为","authors":"Ziming He;Chao Song;Jingchen Li;Haobin Shi","doi":"10.1109/TCDS.2024.3471645","DOIUrl":null,"url":null,"abstract":"We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 3","pages":"495-509"},"PeriodicalIF":4.9000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity\",\"authors\":\"Ziming He;Chao Song;Jingchen Li;Haobin Shi\",\"doi\":\"10.1109/TCDS.2024.3471645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.\",\"PeriodicalId\":54300,\"journal\":{\"name\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"volume\":\"17 3\",\"pages\":\"495-509\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cognitive and Developmental Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10704571/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10704571/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity
We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.
期刊介绍:
The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.