基于层次强化学习的动态地下城爬行游戏

2018 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2018-11-01 DOI:10.1109/SSCI.2018.8628914

R. Niel, M. Wiering

{"title":"基于层次强化学习的动态地下城爬行游戏","authors":"R. Niel, M. Wiering","doi":"10.1109/SSCI.2018.8628914","DOIUrl":null,"url":null,"abstract":"This paper describes a novel hierarchical reinforcement learning (HRL) algorithm for training an autonomous agent to play a dungeon crawler game. As opposed to most previous HRL frameworks, the proposed HRL system does not contain complex actions that take multiple time steps. Instead there is a hierarchy of behaviours which can either execute an action or delegate the decision to a sub-behaviour lower in the hierarchy. The actions or sub-behaviours are chosen by learning the estimated cumulative reward. Since each action only takes one time step and the system starts at the top of the hierarchy at every time step, the system is able to dynamically react to changes in its environment. The developed dungeon crawler game requires the agent to take keys, open doors, and go to the exit while evading or fighting with enemy units. Based on these tasks, behaviours are constructed and trained with a combination of multi-layer perceptrons and Q-learning. The system also uses a kind of multi-bjective learning that allows multiple parts of the hierarchy to simultaneously learn from a chosen action using their own reward function. The performance of the system is compared to an agent using MaxQ-learning that shares a similar overall design. The results show that the proposed dynamic HRL (dHRL) system yields much higher scores and win rates in different game levels and is able to learn to perform very well with only 500 training games.","PeriodicalId":235735,"journal":{"name":"2018 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Hierarchical Reinforcement Learning for Playing a Dynamic Dungeon Crawler Game\",\"authors\":\"R. Niel, M. Wiering\",\"doi\":\"10.1109/SSCI.2018.8628914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a novel hierarchical reinforcement learning (HRL) algorithm for training an autonomous agent to play a dungeon crawler game. As opposed to most previous HRL frameworks, the proposed HRL system does not contain complex actions that take multiple time steps. Instead there is a hierarchy of behaviours which can either execute an action or delegate the decision to a sub-behaviour lower in the hierarchy. The actions or sub-behaviours are chosen by learning the estimated cumulative reward. Since each action only takes one time step and the system starts at the top of the hierarchy at every time step, the system is able to dynamically react to changes in its environment. The developed dungeon crawler game requires the agent to take keys, open doors, and go to the exit while evading or fighting with enemy units. Based on these tasks, behaviours are constructed and trained with a combination of multi-layer perceptrons and Q-learning. The system also uses a kind of multi-bjective learning that allows multiple parts of the hierarchy to simultaneously learn from a chosen action using their own reward function. The performance of the system is compared to an agent using MaxQ-learning that shares a similar overall design. The results show that the proposed dynamic HRL (dHRL) system yields much higher scores and win rates in different game levels and is able to learn to perform very well with only 500 training games.\",\"PeriodicalId\":235735,\"journal\":{\"name\":\"2018 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI.2018.8628914\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2018.8628914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文描述了一种新的分层强化学习(HRL)算法，用于训练自主智能体玩地下城爬行游戏。与大多数以前的HRL框架相反，提出的HRL系统不包含需要多个时间步骤的复杂操作。相反，存在一个行为层次结构，它可以执行一个动作，也可以将决策委托给层次结构中较低的子行为。通过学习估计的累积奖励来选择行动或子行为。由于每个动作只需要一个时间步，并且系统在每个时间步都从层次结构的顶部开始，因此系统能够动态地对其环境中的变化做出反应。开发的地下城爬行游戏要求代理人拿钥匙，打开门，走到出口，同时躲避或与敌人单位战斗。基于这些任务，结合多层感知器和q学习来构建和训练行为。该系统还使用了一种多目标学习，允许层次结构的多个部分同时使用自己的奖励函数从选择的动作中学习。将该系统的性能与使用maxq学习的智能体进行比较，后者具有相似的总体设计。结果表明，所提出的动态HRL (dHRL)系统在不同的游戏级别中获得了更高的分数和胜率，并且能够在500个训练游戏中学习并取得很好的成绩。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hierarchical Reinforcement Learning for Playing a Dynamic Dungeon Crawler Game

This paper describes a novel hierarchical reinforcement learning (HRL) algorithm for training an autonomous agent to play a dungeon crawler game. As opposed to most previous HRL frameworks, the proposed HRL system does not contain complex actions that take multiple time steps. Instead there is a hierarchy of behaviours which can either execute an action or delegate the decision to a sub-behaviour lower in the hierarchy. The actions or sub-behaviours are chosen by learning the estimated cumulative reward. Since each action only takes one time step and the system starts at the top of the hierarchy at every time step, the system is able to dynamically react to changes in its environment. The developed dungeon crawler game requires the agent to take keys, open doors, and go to the exit while evading or fighting with enemy units. Based on these tasks, behaviours are constructed and trained with a combination of multi-layer perceptrons and Q-learning. The system also uses a kind of multi-bjective learning that allows multiple parts of the hierarchy to simultaneously learn from a chosen action using their own reward function. The performance of the system is compared to an agent using MaxQ-learning that shares a similar overall design. The results show that the proposed dynamic HRL (dHRL) system yields much higher scores and win rates in different game levels and is able to learn to perform very well with only 500 training games.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量