Jose Pastor, H. Díaz, L. Armesto, A. Esparza, A. Sala
{"title":"Learning Upper-Level Policy using Importance Sampling-based Policy Search Method","authors":"Jose Pastor, H. Díaz, L. Armesto, A. Esparza, A. Sala","doi":"10.1109/ICOSC.2018.8587772","DOIUrl":null,"url":null,"abstract":"Policy search methods are a successful approach to reinforcement learning. These allow to learn upper-level policies whose main advantage is that these distributions explore directly in the parameter space. The contribution of this paper is to propose an algorithm based on importance sampling methods and local linear regression that uses the samples in an efficient way. In order to get this aim, we propose to include information of all the past samples in the learning process using importance sampling methods. Additionally, we use the gradient direction of the linear local model reward to explore regions where the prediction of the reward could be better.","PeriodicalId":153985,"journal":{"name":"2018 7th International Conference on Systems and Control (ICSC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th International Conference on Systems and Control (ICSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSC.2018.8587772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Policy search methods are a successful approach to reinforcement learning. These allow to learn upper-level policies whose main advantage is that these distributions explore directly in the parameter space. The contribution of this paper is to propose an algorithm based on importance sampling methods and local linear regression that uses the samples in an efficient way. In order to get this aim, we propose to include information of all the past samples in the learning process using importance sampling methods. Additionally, we use the gradient direction of the linear local model reward to explore regions where the prediction of the reward could be better.