{"title":"An Approach to Optimize Replay Buffer in Value-Based Reinforcement Learning","authors":"Baicheng Chen, Tianhan Gao, Qingwei Mi","doi":"10.1109/SoSE59841.2023.10178657","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has seen numerous advancements in recent years, particularly in the area of value-based algorithms. A key component of these algorithms is the Replay Buffer, which stores past experiences to improve learning. In this paper, the authors explore an optimization method for the Replay Buffer that increases the learning efficiency of an agent by prioritizing experiences based on their training value (T). The authors test the proposed approach in two environments, a maze and Cartpole-v1, comparing it to traditional Q-learning and Deep Q-Networks (DQN) algorithms. The results demonstrate improvements in learning efficiency and training effects, showing potential for the application of the method in various RL scenarios.","PeriodicalId":181642,"journal":{"name":"2023 18th Annual System of Systems Engineering Conference (SoSe)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th Annual System of Systems Engineering Conference (SoSe)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SoSE59841.2023.10178657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement Learning (RL) has seen numerous advancements in recent years, particularly in the area of value-based algorithms. A key component of these algorithms is the Replay Buffer, which stores past experiences to improve learning. In this paper, the authors explore an optimization method for the Replay Buffer that increases the learning efficiency of an agent by prioritizing experiences based on their training value (T). The authors test the proposed approach in two environments, a maze and Cartpole-v1, comparing it to traditional Q-learning and Deep Q-Networks (DQN) algorithms. The results demonstrate improvements in learning efficiency and training effects, showing potential for the application of the method in various RL scenarios.