多重追逃马尔可夫博弈的分散学习

S. Givigi, H. Schwartz
{"title":"多重追逃马尔可夫博弈的分散学习","authors":"S. Givigi, H. Schwartz","doi":"10.1109/MED.2011.5983135","DOIUrl":null,"url":null,"abstract":"We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.","PeriodicalId":146203,"journal":{"name":"2011 19th Mediterranean Conference on Control & Automation (MED)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Decentralized learning in multiple pursuer-evader Markov games\",\"authors\":\"S. Givigi, H. Schwartz\",\"doi\":\"10.1109/MED.2011.5983135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.\",\"PeriodicalId\":146203,\"journal\":{\"name\":\"2011 19th Mediterranean Conference on Control & Automation (MED)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 19th Mediterranean Conference on Control & Automation (MED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED.2011.5983135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 19th Mediterranean Conference on Control & Automation (MED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED.2011.5983135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

我们将多个追求者和逃避者的博弈表示为马尔可夫博弈,每个参与者都是一个分散的单元,必须独立工作才能完成任务。针对这种分布式多智能体决策问题提出的大多数解决方案都需要某种形式的中心协调。在本文中,我们打算将每个参与者建模为一个学习自动机(LA),并让他们进化和适应,以解决他们手头的难题。我们还将展示,使用所提出的学习过程,参与者的策略将收敛到一个平衡点。为了证明该方法的可行性,给出了具有多个跟踪器和多个逃避器的仿真。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Decentralized learning in multiple pursuer-evader Markov games
We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信