Social behavior as a key to learning-based multi-agent pathfinding dilemmas

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2025-07-30 DOI:10.1016/j.artint.2025.104397

Chengyang He, Tanishq Duhan, Parth Tulsyan, Patrick Kim, Guillaume Sartoretti

{"title":"Social behavior as a key to learning-based multi-agent pathfinding dilemmas","authors":"Chengyang He, Tanishq Duhan, Parth Tulsyan, Patrick Kim, Guillaume Sartoretti","doi":"10.1016/j.artint.2025.104397","DOIUrl":null,"url":null,"abstract":"<div><div>The Multi-agent Path Finding (MAPF) problem involves finding collision-free paths for a team of agents in a known, static environment, with important applications in warehouse automation, logistics, or last-mile delivery. To meet the needs of these large-scale applications, current learning-based methods often deploy the same fully trained, decentralized network to all agents to improve scalability. However, such parameter sharing typically results in homogeneous behaviors among agents, which may prevent agents from breaking ties around symmetric conflict (e.g., bottlenecks) and might lead to live-/deadlocks. In this paper, we propose SYLPH, a novel learning-based MAPF framework aimed to mitigate the adverse effects of homogeneity by allowing agents to learn and dynamically select different social behaviors (akin to individual, dynamic roles), without affecting the scalability offered by parameter sharing. Specifically, SYLPH offers a novel hierarchical mechanism by introducing Social Value Orientation (SVO) as a temporally extended latent variable that plays a central role in both policy generation and reward assignment. To support this hierarchical decision-making process, we introduce Social-aware Multi-Policy PPO (SMP3O), a reinforcement learning method that ensures stable and effective training through a mechanism for the cross-utilization of advantages. Moreover, we design an SVO-based learning tie-breaking algorithm, allowing agents to proactively avoid collisions, rather than relying solely on post-processing techniques. As a result of this hierarchical decision-making and exchange of social preferences, SYLPH endows agents with the ability to reason about the MAPF task through more latent spaces and nuanced contexts, leading to varied responses that can help break ties around symmetric conflicts. Our comparative experiments show that SYLPH achieves state-of-the-art performance, surpassing other learning-based MAPF planners in random, room-like, and maze-like maps, while our ablation studies demonstrate the advantages of each component in SYLPH. We finally experimentally validate our trained policies on hardware in three types of maps, showing how SYLPH allows agents to find high-quality paths under real-life conditions. Our code and videos are available at: <span><span>marmotlab.github.io/mapf_sylph</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"348 ","pages":"Article 104397"},"PeriodicalIF":4.6000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S000437022500116X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The Multi-agent Path Finding (MAPF) problem involves finding collision-free paths for a team of agents in a known, static environment, with important applications in warehouse automation, logistics, or last-mile delivery. To meet the needs of these large-scale applications, current learning-based methods often deploy the same fully trained, decentralized network to all agents to improve scalability. However, such parameter sharing typically results in homogeneous behaviors among agents, which may prevent agents from breaking ties around symmetric conflict (e.g., bottlenecks) and might lead to live-/deadlocks. In this paper, we propose SYLPH, a novel learning-based MAPF framework aimed to mitigate the adverse effects of homogeneity by allowing agents to learn and dynamically select different social behaviors (akin to individual, dynamic roles), without affecting the scalability offered by parameter sharing. Specifically, SYLPH offers a novel hierarchical mechanism by introducing Social Value Orientation (SVO) as a temporally extended latent variable that plays a central role in both policy generation and reward assignment. To support this hierarchical decision-making process, we introduce Social-aware Multi-Policy PPO (SMP3O), a reinforcement learning method that ensures stable and effective training through a mechanism for the cross-utilization of advantages. Moreover, we design an SVO-based learning tie-breaking algorithm, allowing agents to proactively avoid collisions, rather than relying solely on post-processing techniques. As a result of this hierarchical decision-making and exchange of social preferences, SYLPH endows agents with the ability to reason about the MAPF task through more latent spaces and nuanced contexts, leading to varied responses that can help break ties around symmetric conflicts. Our comparative experiments show that SYLPH achieves state-of-the-art performance, surpassing other learning-based MAPF planners in random, room-like, and maze-like maps, while our ablation studies demonstrate the advantages of each component in SYLPH. We finally experimentally validate our trained policies on hardware in three types of maps, showing how SYLPH allows agents to find high-quality paths under real-life conditions. Our code and videos are available at: marmotlab.github.io/mapf_sylph.

查看原文本刊更多论文

社会行为是基于学习的多智能体寻径困境的关键

多代理寻路（Multi-agent Path Finding， MAPF）问题涉及在已知的静态环境中为一组代理寻找无冲突的路径，在仓库自动化、物流或最后一英里交付中具有重要应用。为了满足这些大规模应用程序的需求，当前基于学习的方法通常为所有代理部署相同的经过充分训练的分散网络，以提高可扩展性。然而，这种参数共享通常会导致代理之间的同质行为，这可能会阻止代理打破围绕对称冲突（例如，瓶颈）的联系，并可能导致活锁/死锁。在本文中，我们提出了SYLPH，这是一种新的基于学习的MAPF框架，旨在通过允许智能体学习和动态选择不同的社会行为（类似于个体的动态角色），而不影响参数共享提供的可扩展性，从而减轻同质性的不利影响。具体而言，SYLPH通过引入社会价值取向（SVO）作为一个在政策制定和奖励分配中发挥核心作用的时间扩展潜在变量，提供了一种新的分层机制。为了支持这种分层决策过程，我们引入了社会感知多策略PPO (smp30)，这是一种强化学习方法，通过交叉利用优势的机制确保稳定有效的训练。此外，我们设计了一种基于svo的学习断绳算法，允许智能体主动避免碰撞，而不是仅仅依赖后处理技术。由于这种分层决策和社会偏好的交换，SYLPH赋予智能体通过更多潜在空间和微妙背景来推理MAPF任务的能力，从而导致不同的反应，有助于打破对称冲突周围的联系。我们的对比实验表明，SYLPH达到了最先进的性能，在随机、房间和迷宫地图中超越了其他基于学习的MAPF规划器，而我们的消融研究表明了SYLPH中每个组件的优势。最后，我们在三种类型的地图上通过实验验证了我们在硬件上训练好的策略，展示了SYLPH如何允许智能体在现实条件下找到高质量的路径。我们的代码和视频可在：marmotlab.github.io/mapf_sylph。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.