{"title":"Adaptive Average Exploration in Multi-Agent Reinforcement Learning","authors":"Garrett Hall, K. Holladay","doi":"10.1109/DASC50938.2020.9256721","DOIUrl":null,"url":null,"abstract":"The objective of this research project was to improve Multi-Agent Reinforcement Learning performance in the StarCraft II environment with respect to faster training times, greater stability, and higher win ratios by 1) creating an adaptive action selector we call Adaptive Average Exploration, 2) using experiences previously learned by a neural network via Transfer Learning, and 3) updating the network simultaneously with its random action selector epsilon. We describe how agents interact with the StarCraft II environment and the QMIX algorithm used to test our approaches. We compare our AAE action selection approach with the default epsilon greedy method used by QMIX. These approaches are used to train Transfer Learning (TL) agents under a variety of test cases. We evaluate our TL agents using a predefined set of metrics. Finally, we demonstrate the effects of updating the neural networks and epsilon together more frequently on network performance.","PeriodicalId":112045,"journal":{"name":"2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASC50938.2020.9256721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The objective of this research project was to improve Multi-Agent Reinforcement Learning performance in the StarCraft II environment with respect to faster training times, greater stability, and higher win ratios by 1) creating an adaptive action selector we call Adaptive Average Exploration, 2) using experiences previously learned by a neural network via Transfer Learning, and 3) updating the network simultaneously with its random action selector epsilon. We describe how agents interact with the StarCraft II environment and the QMIX algorithm used to test our approaches. We compare our AAE action selection approach with the default epsilon greedy method used by QMIX. These approaches are used to train Transfer Learning (TL) agents under a variety of test cases. We evaluate our TL agents using a predefined set of metrics. Finally, we demonstrate the effects of updating the neural networks and epsilon together more frequently on network performance.