Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-09-26 DOI:10.1016/j.neucom.2025.131654

Ifrah Saeed , Andrew C. Cullen , Zainab Zaidi , Sarah Erfani , Tansu Alpcan

{"title":"Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation","authors":"Ifrah Saeed , Andrew C. Cullen , Zainab Zaidi , Sarah Erfani , Tansu Alpcan","doi":"10.1016/j.neucom.2025.131654","DOIUrl":null,"url":null,"abstract":"<div><div>The practical applicability of multiagent reinforcement learning is hindered by its low sample efficiency and slow learning speed. While reward shaping and expert guidance can partially mitigate these challenges, their efficiency is offset by the need for substantial manual effort. To address these constraints, we introduce Multiagent Environment-aware semi-Automated Guide (MEAG), a novel framework that leverages widely known, highly efficient, and low-resolution single-agent pathfinding algorithms for shaping rewards to guide multiagent reinforcement learning agents. MEAG uses these single-agent solvers over a coarse-grid surrogate that requires minimal manual intervention, and guides agents away from random exploration in a manner that significantly reduces computational costs. When tested across a range of densely and sparsely connected multiagent navigation environments, MEAG consistently outperforms state-of-the-art algorithms, achieving up to <span><math><mn>50</mn><mspace></mspace><mi>%</mi></math></span> faster convergence and <span><math><mn>20</mn><mspace></mspace><mi>%</mi></math></span> higher rewards. These improvements enable the consideration of MARL for more complex real-world pathfinding applications ranging from warehouse automation to search and rescue operations, and swarm robotics.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131654"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225023264","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The practical applicability of multiagent reinforcement learning is hindered by its low sample efficiency and slow learning speed. While reward shaping and expert guidance can partially mitigate these challenges, their efficiency is offset by the need for substantial manual effort. To address these constraints, we introduce Multiagent Environment-aware semi-Automated Guide (MEAG), a novel framework that leverages widely known, highly efficient, and low-resolution single-agent pathfinding algorithms for shaping rewards to guide multiagent reinforcement learning agents. MEAG uses these single-agent solvers over a coarse-grid surrogate that requires minimal manual intervention, and guides agents away from random exploration in a manner that significantly reduces computational costs. When tested across a range of densely and sparsely connected multiagent navigation environments, MEAG consistently outperforms state-of-the-art algorithms, achieving up to

50 %

faster convergence and

20 %

higher rewards. These improvements enable the consideration of MARL for more complex real-world pathfinding applications ranging from warehouse automation to search and rescue operations, and swarm robotics.

查看原文本刊更多论文

导航中多智能体强化学习的算法设计奖励塑造

多智能体强化学习的样本效率低、学习速度慢，阻碍了多智能体强化学习的实际应用。虽然奖励形成和专家指导可以部分缓解这些挑战，但它们的效率被大量人工工作的需求所抵消。为了解决这些限制，我们引入了多智能体环境感知半自动指南（MEAG），这是一个新颖的框架，利用众所周知的、高效的、低分辨率的单智能体寻路算法来形成奖励，以指导多智能体强化学习智能体。MEAG在粗网格代理上使用这些单代理求解器，需要最少的人工干预，并以显著降低计算成本的方式引导代理远离随机探索。在一系列密集和稀疏连接的多智能体导航环境中进行测试时，MEAG始终优于最先进的算法，实现了高达50%的收敛速度和20%的高回报。这些改进使MARL能够用于更复杂的现实世界寻路应用，从仓库自动化到搜索和救援行动，以及群体机器人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.