Nguyen Do Hoang Khoi, Cuong Pham Van, Hoang Vu Tran, C. Truong
{"title":"近端策略优化的多目标探索","authors":"Nguyen Do Hoang Khoi, Cuong Pham Van, Hoang Vu Tran, C. Truong","doi":"10.1109/ATiGB50996.2021.9423319","DOIUrl":null,"url":null,"abstract":"In Reinforcement Learning, the reward is one of the main components to optimize the strategy. While other approaches are based on a simple scalar reward to get an optimal policy, we propose a model learning the designated reward in numerous conditions. Our method, which we call multi-objective exploration for proximal policy optimization (MOE-PPO), alleviates the dependence on the reward design by executing the Preferent Surrogate Objective (PSO). We also make full use of Curiosity Driven Exploration to increase exploration ability. Our experiments test MOE-PPO in the Super Mario Bros environment designed by OpenAIGym with three criteria to illustrate our approach's effectiveness. The result shows that MOE-PPO outperforms other on-policy algorithms under many conditions.","PeriodicalId":6690,"journal":{"name":"2020 Applying New Technology in Green Buildings (ATiGB)","volume":"14 1","pages":"105-109"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multi-Objective Exploration for Proximal Policy Optimization\",\"authors\":\"Nguyen Do Hoang Khoi, Cuong Pham Van, Hoang Vu Tran, C. Truong\",\"doi\":\"10.1109/ATiGB50996.2021.9423319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Reinforcement Learning, the reward is one of the main components to optimize the strategy. While other approaches are based on a simple scalar reward to get an optimal policy, we propose a model learning the designated reward in numerous conditions. Our method, which we call multi-objective exploration for proximal policy optimization (MOE-PPO), alleviates the dependence on the reward design by executing the Preferent Surrogate Objective (PSO). We also make full use of Curiosity Driven Exploration to increase exploration ability. Our experiments test MOE-PPO in the Super Mario Bros environment designed by OpenAIGym with three criteria to illustrate our approach's effectiveness. The result shows that MOE-PPO outperforms other on-policy algorithms under many conditions.\",\"PeriodicalId\":6690,\"journal\":{\"name\":\"2020 Applying New Technology in Green Buildings (ATiGB)\",\"volume\":\"14 1\",\"pages\":\"105-109\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Applying New Technology in Green Buildings (ATiGB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATiGB50996.2021.9423319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Applying New Technology in Green Buildings (ATiGB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATiGB50996.2021.9423319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Objective Exploration for Proximal Policy Optimization
In Reinforcement Learning, the reward is one of the main components to optimize the strategy. While other approaches are based on a simple scalar reward to get an optimal policy, we propose a model learning the designated reward in numerous conditions. Our method, which we call multi-objective exploration for proximal policy optimization (MOE-PPO), alleviates the dependence on the reward design by executing the Preferent Surrogate Objective (PSO). We also make full use of Curiosity Driven Exploration to increase exploration ability. Our experiments test MOE-PPO in the Super Mario Bros environment designed by OpenAIGym with three criteria to illustrate our approach's effectiveness. The result shows that MOE-PPO outperforms other on-policy algorithms under many conditions.