{"title":"基于q -不相关抽象的强化学习表示","authors":"Shuai Hao, Luntong Li, Minsong Liu, Yuanheng Zhu, Dongbin Zhao","doi":"10.1109/ICICIP53388.2021.9642160","DOIUrl":null,"url":null,"abstract":"In order to improve the performance of deep reinforcement learning (DRL) algorithm in high-dimensional observation environments, we propose a new auxiliary task to learn representations to aggregate task-relevant information of observations. Inspired by Q-irrelevance abstraction, our auxiliary task trains a deep Q-network (DQN) to predict the true Q value distribution over all discrete actions. Then we use the output of DQN to train the encoder to discriminate states with different Q values. The encoder is used as the representation of proximal policy optimization (PPO). The resulting algorithm is called as Q-irrelevance Abstraction for Reinforcement Learning (QIARL). After training, the encoder can aggregate states with similar Q value distributions together for any policy and any action. Thus the encoder can encode the important information that is relevant to reinforcement learning task. We test QIARL in four Procgen environments compare with PPO, A2C and Rainbow. The experimental results show QIARL outperforms the other three algorithms.","PeriodicalId":435799,"journal":{"name":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"30 52","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning Representation with Q-irrelevance Abstraction for Reinforcement Learning\",\"authors\":\"Shuai Hao, Luntong Li, Minsong Liu, Yuanheng Zhu, Dongbin Zhao\",\"doi\":\"10.1109/ICICIP53388.2021.9642160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to improve the performance of deep reinforcement learning (DRL) algorithm in high-dimensional observation environments, we propose a new auxiliary task to learn representations to aggregate task-relevant information of observations. Inspired by Q-irrelevance abstraction, our auxiliary task trains a deep Q-network (DQN) to predict the true Q value distribution over all discrete actions. Then we use the output of DQN to train the encoder to discriminate states with different Q values. The encoder is used as the representation of proximal policy optimization (PPO). The resulting algorithm is called as Q-irrelevance Abstraction for Reinforcement Learning (QIARL). After training, the encoder can aggregate states with similar Q value distributions together for any policy and any action. Thus the encoder can encode the important information that is relevant to reinforcement learning task. We test QIARL in four Procgen environments compare with PPO, A2C and Rainbow. The experimental results show QIARL outperforms the other three algorithms.\",\"PeriodicalId\":435799,\"journal\":{\"name\":\"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)\",\"volume\":\"30 52\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIP53388.2021.9642160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP53388.2021.9642160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Representation with Q-irrelevance Abstraction for Reinforcement Learning
In order to improve the performance of deep reinforcement learning (DRL) algorithm in high-dimensional observation environments, we propose a new auxiliary task to learn representations to aggregate task-relevant information of observations. Inspired by Q-irrelevance abstraction, our auxiliary task trains a deep Q-network (DQN) to predict the true Q value distribution over all discrete actions. Then we use the output of DQN to train the encoder to discriminate states with different Q values. The encoder is used as the representation of proximal policy optimization (PPO). The resulting algorithm is called as Q-irrelevance Abstraction for Reinforcement Learning (QIARL). After training, the encoder can aggregate states with similar Q value distributions together for any policy and any action. Thus the encoder can encode the important information that is relevant to reinforcement learning task. We test QIARL in four Procgen environments compare with PPO, A2C and Rainbow. The experimental results show QIARL outperforms the other three algorithms.