{"title":"Improving Reinforcement Learning Exploration by Autoencoders","authors":"Gabor Paczolay, István Harmati","doi":"10.3311/ppee.36789","DOIUrl":null,"url":null,"abstract":"Reinforcement learning is a field with massive potential related to solving engineering problems without field knowledge. However, the problem of exploration and exploitation emerges when one tries to balance a system between the learning phase and proper execution. In this paper, a new method is proposed that utilizes autoencoders to manage the exploration rate in an epsilon-greedy exploration algorithm. The error between the real state and the reconstructed state by the autoencoder becomes the base of the exploration-exploitation rate. The proposed method is then examined in two experiments: one benchmark is the cartpole experiment while the other is a gridworld example created for this paper to examine long-term exploration. Both experiments show results such that the proposed method performs better in these scenarios.","PeriodicalId":37664,"journal":{"name":"Periodica polytechnica Electrical engineering and computer science","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Periodica polytechnica Electrical engineering and computer science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3311/ppee.36789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement learning is a field with massive potential related to solving engineering problems without field knowledge. However, the problem of exploration and exploitation emerges when one tries to balance a system between the learning phase and proper execution. In this paper, a new method is proposed that utilizes autoencoders to manage the exploration rate in an epsilon-greedy exploration algorithm. The error between the real state and the reconstructed state by the autoencoder becomes the base of the exploration-exploitation rate. The proposed method is then examined in two experiments: one benchmark is the cartpole experiment while the other is a gridworld example created for this paper to examine long-term exploration. Both experiments show results such that the proposed method performs better in these scenarios.
期刊介绍:
The main scope of the journal is to publish original research articles in the wide field of electrical engineering and informatics fitting into one of the following five Sections of the Journal: (i) Communication systems, networks and technology, (ii) Computer science and information theory, (iii) Control, signal processing and signal analysis, medical applications, (iv) Components, Microelectronics and Material Sciences, (v) Power engineering and mechatronics, (vi) Mobile Software, Internet of Things and Wearable Devices, (vii) Solid-state lighting and (viii) Vehicular Technology (land, airborne, and maritime mobile services; automotive, radar systems; antennas and radio wave propagation).