MAPF-GPT：多代理规模寻路的模仿学习

arXiv - CS - Multiagent Systems Pub Date : 2024-08-29 DOI:arxiv-2409.00134

Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik

{"title":"MAPF-GPT：多代理规模寻路的模仿学习","authors":"Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik","doi":"arxiv-2409.00134","DOIUrl":null,"url":null,"abstract":"Multi-agent pathfinding (MAPF) is a challenging computational problem that\ntypically requires to find collision-free paths for multiple agents in a shared\nenvironment. Solving MAPF optimally is NP-hard, yet efficient solutions are\ncritical for numerous applications, including automated warehouses and\ntransportation systems. Recently, learning-based approaches to MAPF have gained\nattention, particularly those leveraging deep reinforcement learning. Following\ncurrent trends in machine learning, we have created a foundation model for the\nMAPF problems called MAPF-GPT. Using imitation learning, we have trained a\npolicy on a set of pre-collected sub-optimal expert trajectories that can\ngenerate actions in conditions of partial observability without additional\nheuristics, reward functions, or communication with other agents. The resulting\nMAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF\nproblem instances that were not present in the training dataset. We show that\nMAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers\non a diverse range of problem instances and is efficient in terms of\ncomputation (in the inference mode).","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"54 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale\",\"authors\":\"Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik\",\"doi\":\"arxiv-2409.00134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-agent pathfinding (MAPF) is a challenging computational problem that\\ntypically requires to find collision-free paths for multiple agents in a shared\\nenvironment. Solving MAPF optimally is NP-hard, yet efficient solutions are\\ncritical for numerous applications, including automated warehouses and\\ntransportation systems. Recently, learning-based approaches to MAPF have gained\\nattention, particularly those leveraging deep reinforcement learning. Following\\ncurrent trends in machine learning, we have created a foundation model for the\\nMAPF problems called MAPF-GPT. Using imitation learning, we have trained a\\npolicy on a set of pre-collected sub-optimal expert trajectories that can\\ngenerate actions in conditions of partial observability without additional\\nheuristics, reward functions, or communication with other agents. The resulting\\nMAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF\\nproblem instances that were not present in the training dataset. We show that\\nMAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers\\non a diverse range of problem instances and is efficient in terms of\\ncomputation (in the inference mode).\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"54 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多代理寻路（MAPF）是一个具有挑战性的计算问题，通常需要为共享环境中的多个代理寻找无碰撞路径。以最佳方式求解 MAPF 是 NP 难题，但高效的解决方案对自动化仓库和运输系统等众多应用至关重要。最近，基于学习的 MAPF 方法备受关注，尤其是那些利用深度强化学习的方法。顺应当前机器学习的发展趋势，我们为 MAPF 问题创建了一个名为 MAPF-GPT 的基础模型。利用模仿学习，我们在一组预先收集的次优专家轨迹上训练了政策，这些轨迹可以在部分可观测条件下生成行动，而无需额外的启发式方法、奖励函数或与其他代理的通信。由此产生的 MAPF-GPT 模型在解决训练数据集中不存在的 MAPFproblem 实例时，表现出了 "零 "学习能力。我们的研究表明，MAPF-GPT 在各种问题实例中的表现明显优于目前表现最好的可学习 MAPF 求解器，而且在计算方面（推理模式下）也很高效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

Multi-agent pathfinding (MAPF) is a challenging computational problem that typically requires to find collision-free paths for multiple agents in a shared environment. Solving MAPF optimally is NP-hard, yet efficient solutions are critical for numerous applications, including automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Following current trends in machine learning, we have created a foundation model for the MAPF problems called MAPF-GPT. Using imitation learning, we have trained a policy on a set of pre-collected sub-optimal expert trajectories that can generate actions in conditions of partial observability without additional heuristics, reward functions, or communication with other agents. The resulting MAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF problem instances that were not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers on a diverse range of problem instances and is efficient in terms of computation (in the inference mode).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量