{"title":"基于值的强化学习中重播缓冲区的优化方法","authors":"Baicheng Chen, Tianhan Gao, Qingwei Mi","doi":"10.1109/SoSE59841.2023.10178657","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has seen numerous advancements in recent years, particularly in the area of value-based algorithms. A key component of these algorithms is the Replay Buffer, which stores past experiences to improve learning. In this paper, the authors explore an optimization method for the Replay Buffer that increases the learning efficiency of an agent by prioritizing experiences based on their training value (T). The authors test the proposed approach in two environments, a maze and Cartpole-v1, comparing it to traditional Q-learning and Deep Q-Networks (DQN) algorithms. The results demonstrate improvements in learning efficiency and training effects, showing potential for the application of the method in various RL scenarios.","PeriodicalId":181642,"journal":{"name":"2023 18th Annual System of Systems Engineering Conference (SoSe)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Approach to Optimize Replay Buffer in Value-Based Reinforcement Learning\",\"authors\":\"Baicheng Chen, Tianhan Gao, Qingwei Mi\",\"doi\":\"10.1109/SoSE59841.2023.10178657\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning (RL) has seen numerous advancements in recent years, particularly in the area of value-based algorithms. A key component of these algorithms is the Replay Buffer, which stores past experiences to improve learning. In this paper, the authors explore an optimization method for the Replay Buffer that increases the learning efficiency of an agent by prioritizing experiences based on their training value (T). The authors test the proposed approach in two environments, a maze and Cartpole-v1, comparing it to traditional Q-learning and Deep Q-Networks (DQN) algorithms. The results demonstrate improvements in learning efficiency and training effects, showing potential for the application of the method in various RL scenarios.\",\"PeriodicalId\":181642,\"journal\":{\"name\":\"2023 18th Annual System of Systems Engineering Conference (SoSe)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th Annual System of Systems Engineering Conference (SoSe)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SoSE59841.2023.10178657\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th Annual System of Systems Engineering Conference (SoSe)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SoSE59841.2023.10178657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach to Optimize Replay Buffer in Value-Based Reinforcement Learning
Reinforcement Learning (RL) has seen numerous advancements in recent years, particularly in the area of value-based algorithms. A key component of these algorithms is the Replay Buffer, which stores past experiences to improve learning. In this paper, the authors explore an optimization method for the Replay Buffer that increases the learning efficiency of an agent by prioritizing experiences based on their training value (T). The authors test the proposed approach in two environments, a maze and Cartpole-v1, comparing it to traditional Q-learning and Deep Q-Networks (DQN) algorithms. The results demonstrate improvements in learning efficiency and training effects, showing potential for the application of the method in various RL scenarios.