Edbert Felix Fangasadha, Steffi Soeroredjo, Anderies, A. A. Gunawan
{"title":"Literature Review of OpenAI Five’s Mechanisms in Dota 2’s Bot Player","authors":"Edbert Felix Fangasadha, Steffi Soeroredjo, Anderies, A. A. Gunawan","doi":"10.1109/iSemantic55962.2022.9920480","DOIUrl":null,"url":null,"abstract":"Multiplayer Online Battle Arena (MOBA) games, such as Dota 2, present significant problems to AI systems, such as multi-agent, massive state-action space, and sophisticated action control. Those problems will become increasingly important in the development of more powerful AI systems. OpenAI Five has demonstrated that DRL (Deep Reinforcement Learning) agents can be trained to achieve superhuman competence in matches that involve thousands of steps before reaching the end goal without the need of explicit hierarchical macro-actions. These DRL agents, in general, receive high-dimensional inputs at each step and act on deep-neural-network-based policies updated by the learning mechanism to maximize the return in an end-to-end manner. This paper investigates the approaches employed by OpenAI Five to gradually acquire knowledge during training: (1) using surgeries to solve the problem of game renewals, (2) using hyperparameters instead of ordinary parameters since they cannot be processed, (3) making decisions using policies in addition to macro strategies. Finally, how the agents in the game receive and respond to observations and actions happening in each match is included, as an addition to explanations of the dense reward function for multiple agent cooperation created using the zero-sum technique.","PeriodicalId":360042,"journal":{"name":"2022 International Seminar on Application for Technology of Information and Communication (iSemantic)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Seminar on Application for Technology of Information and Communication (iSemantic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSemantic55962.2022.9920480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multiplayer Online Battle Arena (MOBA) games, such as Dota 2, present significant problems to AI systems, such as multi-agent, massive state-action space, and sophisticated action control. Those problems will become increasingly important in the development of more powerful AI systems. OpenAI Five has demonstrated that DRL (Deep Reinforcement Learning) agents can be trained to achieve superhuman competence in matches that involve thousands of steps before reaching the end goal without the need of explicit hierarchical macro-actions. These DRL agents, in general, receive high-dimensional inputs at each step and act on deep-neural-network-based policies updated by the learning mechanism to maximize the return in an end-to-end manner. This paper investigates the approaches employed by OpenAI Five to gradually acquire knowledge during training: (1) using surgeries to solve the problem of game renewals, (2) using hyperparameters instead of ordinary parameters since they cannot be processed, (3) making decisions using policies in addition to macro strategies. Finally, how the agents in the game receive and respond to observations and actions happening in each match is included, as an addition to explanations of the dense reward function for multiple agent cooperation created using the zero-sum technique.