{"title":"多智能体强化学习的多样化经验","authors":"N. A. V. Suryanarayanan, H. Iba","doi":"10.1109/IWCIA47330.2019.8955073","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement learning algorithms have traditionally been applied to tasks that train challenging control behavior. Actor Critic based versions of these algorithms have been used to train agents in state of the art settings. While proving to be sample efficient in multi agent learning, these algorithms tend to perform poorly in the exploration phases. In this paper, the experience gained by the replay buffer during the exploration phase is improved by diversifying the input results using a genetic algorithm. We have tested this method on predator prey environment and other team based tasks. The evaluation shows that our method tends to produce a more robust solutions outperforming the traditional methods.","PeriodicalId":139434,"journal":{"name":"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diversifying experiences in multi agent reinforcement learning\",\"authors\":\"N. A. V. Suryanarayanan, H. Iba\",\"doi\":\"10.1109/IWCIA47330.2019.8955073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Reinforcement learning algorithms have traditionally been applied to tasks that train challenging control behavior. Actor Critic based versions of these algorithms have been used to train agents in state of the art settings. While proving to be sample efficient in multi agent learning, these algorithms tend to perform poorly in the exploration phases. In this paper, the experience gained by the replay buffer during the exploration phase is improved by diversifying the input results using a genetic algorithm. We have tested this method on predator prey environment and other team based tasks. The evaluation shows that our method tends to produce a more robust solutions outperforming the traditional methods.\",\"PeriodicalId\":139434,\"journal\":{\"name\":\"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWCIA47330.2019.8955073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWCIA47330.2019.8955073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Diversifying experiences in multi agent reinforcement learning
Deep Reinforcement learning algorithms have traditionally been applied to tasks that train challenging control behavior. Actor Critic based versions of these algorithms have been used to train agents in state of the art settings. While proving to be sample efficient in multi agent learning, these algorithms tend to perform poorly in the exploration phases. In this paper, the experience gained by the replay buffer during the exploration phase is improved by diversifying the input results using a genetic algorithm. We have tested this method on predator prey environment and other team based tasks. The evaluation shows that our method tends to produce a more robust solutions outperforming the traditional methods.