一种基于强化学习的多智能体追逃博弈方法

A. Bilgin, Esra Kadioglu Urtis
{"title":"一种基于强化学习的多智能体追逃博弈方法","authors":"A. Bilgin, Esra Kadioglu Urtis","doi":"10.1109/ICAR.2015.7251450","DOIUrl":null,"url":null,"abstract":"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.","PeriodicalId":432004,"journal":{"name":"2015 International Conference on Advanced Robotics (ICAR)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An approach to multi-agent pursuit evasion games using reinforcement learning\",\"authors\":\"A. Bilgin, Esra Kadioglu Urtis\",\"doi\":\"10.1109/ICAR.2015.7251450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.\",\"PeriodicalId\":432004,\"journal\":{\"name\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAR.2015.7251450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Advanced Robotics (ICAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAR.2015.7251450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

追赶-逃避博弈一直是机器人领域的热门研究课题。强化学习是一种广泛应用于逃避追踪领域的方法,它利用了智能体与环境的相互作用。本文采用强化学习方法对多智能体追逃问题进行了研究,并给出了实验结果。智能代理使用Watkins的Q(λ)学习算法从它们的交互中学习。Q-learning是一种离策略时间差分控制算法。另一方面,我们使用的方法是q学习和资格跟踪的统一版本。它使用备份信息,直到第一次勘探发生。在我们的工作中,追求团队采用了并行学习的方式。在这种方法中,团队的每个成员都有自己的行动价值函数,并独立地更新其信息空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An approach to multi-agent pursuit evasion games using reinforcement learning
The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信