一种基于强化学习的多智能体追逃博弈方法

2015 International Conference on Advanced Robotics (ICAR) Pub Date : 2015-07-27 DOI:10.1109/ICAR.2015.7251450

A. Bilgin, Esra Kadioglu Urtis

{"title":"一种基于强化学习的多智能体追逃博弈方法","authors":"A. Bilgin, Esra Kadioglu Urtis","doi":"10.1109/ICAR.2015.7251450","DOIUrl":null,"url":null,"abstract":"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.","PeriodicalId":432004,"journal":{"name":"2015 International Conference on Advanced Robotics (ICAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An approach to multi-agent pursuit evasion games using reinforcement learning\",\"authors\":\"A. Bilgin, Esra Kadioglu Urtis\",\"doi\":\"10.1109/ICAR.2015.7251450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.\",\"PeriodicalId\":432004,\"journal\":{\"name\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAR.2015.7251450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Advanced Robotics (ICAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAR.2015.7251450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

追赶-逃避博弈一直是机器人领域的热门研究课题。强化学习是一种广泛应用于逃避追踪领域的方法，它利用了智能体与环境的相互作用。本文采用强化学习方法对多智能体追逃问题进行了研究，并给出了实验结果。智能代理使用Watkins的Q(λ)学习算法从它们的交互中学习。Q-learning是一种离策略时间差分控制算法。另一方面，我们使用的方法是q学习和资格跟踪的统一版本。它使用备份信息，直到第一次勘探发生。在我们的工作中，追求团队采用了并行学习的方式。在这种方法中，团队的每个成员都有自己的行动价值函数，并独立地更新其信息空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An approach to multi-agent pursuit evasion games using reinforcement learning

The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Advanced Robotics (ICAR)

自引率

0.00%

发文量