第20届自主智能体与多智能体系统国际会议论文集

Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems Pub Date : 2020-12-18 DOI:10.5555/3463952

Aravind Venugopal, Elizabeth Bondi-Kelly, Harshavardhan Kamarthi, Keval Dholakia, Balaraman Ravindran, M. Tambe

{"title":"第20届自主智能体与多智能体系统国际会议论文集","authors":"Aravind Venugopal, Elizabeth Bondi-Kelly, Harshavardhan Kamarthi, Keval Dholakia, Balaraman Ravindran, M. Tambe","doi":"10.5555/3463952","DOIUrl":null,"url":null,"abstract":"Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers. The model further incorporates uncertainty for real-time decision-making within a team of drones and human patrollers. Second, we present CombSGPO, a novel and scalable algorithm based on reinforcement learning, to compute a defender strategy for this game model. CombSGPO performs policy search over a multi-dimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q-Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. Third, we provide a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, showing group formation between defender resources and patrolling formations based on signaling and notifications between resources. Importantly, we find that strategic signaling emerges in the final learnt strategy. Finally, we perform experiments to evaluate these strategies under different levels of uncertainty.","PeriodicalId":447893,"journal":{"name":"Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems\",\"authors\":\"Aravind Venugopal, Elizabeth Bondi-Kelly, Harshavardhan Kamarthi, Keval Dholakia, Balaraman Ravindran, M. Tambe\",\"doi\":\"10.5555/3463952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers. The model further incorporates uncertainty for real-time decision-making within a team of drones and human patrollers. Second, we present CombSGPO, a novel and scalable algorithm based on reinforcement learning, to compute a defender strategy for this game model. CombSGPO performs policy search over a multi-dimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q-Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. Third, we provide a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, showing group formation between defender resources and patrolling formations based on signaling and notifications between resources. Importantly, we find that strategic signaling emerges in the final learnt strategy. Finally, we perform experiments to evaluate these strategies under different levels of uncertainty.\",\"PeriodicalId\":447893,\"journal\":{\"name\":\"Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/3463952\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/3463952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

绿色安全游戏已成功应用于保护珍贵资源，例如渔业、森林和野生动物。虽然现实世界的部署涉及资源分配和随后的协调巡逻，包括通信和实时、不确定的信息，但以前的游戏模型并没有同时完全解决这两个阶段。此外，采用现有的解决方案策略是困难的，因为它们不能很好地扩展到更大、更复杂的游戏模型变体。因此，我们首先提出了一种新的GSG模型，该模型结合了防御者分配，巡逻，无人机实时通知巡逻人员以及无人机向攻击者发送警告信号。该模型进一步纳入了无人机和巡逻人员团队实时决策的不确定性。其次，我们提出了一种基于强化学习的新型可扩展算法CombSGPO来计算该博弈模型的防御策略。CombSGPO在多维离散行动空间上执行策略搜索，以计算最适合防御者最佳响应巡逻策略的分配策略，通过训练多代理Deep Q-Network学习。我们通过实验证明，CombSGPO收敛于更好的策略，并且比类似的方法更具可扩展性。第三，我们详细分析了CombSGPO学习的协调和信令行为，展示了基于资源之间的信令和通知的防御资源和巡逻编队之间的群体形成。重要的是，我们发现策略信号出现在最终的学习策略中。最后，我们进行实验来评估这些策略在不同的不确定性水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests and wildlife. While real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication and real-time, uncertain information, previous game models do not fully address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We therefore first propose a novel GSG model that combines defender allocation, patrolling, real-time drone notification to human patrollers, and drones sending warning signals to attackers. The model further incorporates uncertainty for real-time decision-making within a team of drones and human patrollers. Second, we present CombSGPO, a novel and scalable algorithm based on reinforcement learning, to compute a defender strategy for this game model. CombSGPO performs policy search over a multi-dimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep Q-Network. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. Third, we provide a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, showing group formation between defender resources and patrolling formations based on signaling and notifications between resources. Importantly, we find that strategic signaling emerges in the final learnt strategy. Finally, we perform experiments to evaluate these strategies under different levels of uncertainty.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

自引率

0.00%

发文量