多重追逃马尔可夫博弈的分散学习

2011 19th Mediterranean Conference on Control & Automation (MED) Pub Date : 2011-06-20 DOI:10.1109/MED.2011.5983135

S. Givigi, H. Schwartz

{"title":"多重追逃马尔可夫博弈的分散学习","authors":"S. Givigi, H. Schwartz","doi":"10.1109/MED.2011.5983135","DOIUrl":null,"url":null,"abstract":"We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.","PeriodicalId":146203,"journal":{"name":"2011 19th Mediterranean Conference on Control & Automation (MED)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Decentralized learning in multiple pursuer-evader Markov games\",\"authors\":\"S. Givigi, H. Schwartz\",\"doi\":\"10.1109/MED.2011.5983135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.\",\"PeriodicalId\":146203,\"journal\":{\"name\":\"2011 19th Mediterranean Conference on Control & Automation (MED)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 19th Mediterranean Conference on Control & Automation (MED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED.2011.5983135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 19th Mediterranean Conference on Control & Automation (MED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED.2011.5983135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们将多个追求者和逃避者的博弈表示为马尔可夫博弈，每个参与者都是一个分散的单元，必须独立工作才能完成任务。针对这种分布式多智能体决策问题提出的大多数解决方案都需要某种形式的中心协调。在本文中，我们打算将每个参与者建模为一个学习自动机(LA)，并让他们进化和适应，以解决他们手头的难题。我们还将展示，使用所提出的学习过程，参与者的策略将收敛到一个平衡点。为了证明该方法的可行性，给出了具有多个跟踪器和多个逃避器的仿真。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Decentralized learning in multiple pursuer-evader Markov games

We represent the multiple pursuers and evaders game as a Markov game and each player as a decentralized unit that has to work independently in order to complete a task. Most proposed solutions for this distributed multiagent decision problem require some sort of central coordination. In this paper, we intend to model each player as a learning automata (LA) and let them evolve and adapt in order to solve the difficult problem they have at hand. We are also going to show that using the proposed learning process, the players' policies will converge to an equilibrium point. Simulations of such scenarios with multiple pursuers and evaders are presented in order to show the feasibility of the approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 19th Mediterranean Conference on Control & Automation (MED)

自引率

0.00%

发文量