{"title":"Modified Annealed Adversarial Bonus for Adversarially Guided Actor-Critic","authors":"Qian Zhao, Fanyu Zeng, Mao Xu, Jinhui Han","doi":"10.1109/YAC57282.2022.10023796","DOIUrl":null,"url":null,"abstract":"This paper investigates learning efficiency for rein-forcement learning in procedurally generated environments. A more sophisticated method is proposed to adjust the adversarial bonus to promote learning efficiency instead of the linearly decayed scheme in adversarially guided actor-critic. Our method considers the relationship between the bonus adjustment and the learning procedure. In some environments, if an agent performs better in learning, the agent will reach the goal with fewer steps. If the length of the episode decreases, the adversarial bonus will be reduced in our method. In this way, the learning efficiency has been improved in some procedurally generated tasks. Several experiments are implemented in MiniGrid to verify the proposed method. In the experiments, the proposed method outperforms the existing adversarially guided methods in several challenging procedurally-generated tasks.","PeriodicalId":272227,"journal":{"name":"2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/YAC57282.2022.10023796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates learning efficiency for rein-forcement learning in procedurally generated environments. A more sophisticated method is proposed to adjust the adversarial bonus to promote learning efficiency instead of the linearly decayed scheme in adversarially guided actor-critic. Our method considers the relationship between the bonus adjustment and the learning procedure. In some environments, if an agent performs better in learning, the agent will reach the goal with fewer steps. If the length of the episode decreases, the adversarial bonus will be reduced in our method. In this way, the learning efficiency has been improved in some procedurally generated tasks. Several experiments are implemented in MiniGrid to verify the proposed method. In the experiments, the proposed method outperforms the existing adversarially guided methods in several challenging procedurally-generated tasks.