Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ifrah Saeed , Andrew C. Cullen , Zainab Zaidi , Sarah Erfani , Tansu Alpcan
{"title":"Algorithmically-designed reward shaping for multiagent reinforcement learning in navigation","authors":"Ifrah Saeed ,&nbsp;Andrew C. Cullen ,&nbsp;Zainab Zaidi ,&nbsp;Sarah Erfani ,&nbsp;Tansu Alpcan","doi":"10.1016/j.neucom.2025.131654","DOIUrl":null,"url":null,"abstract":"<div><div>The practical applicability of multiagent reinforcement learning is hindered by its low sample efficiency and slow learning speed. While reward shaping and expert guidance can partially mitigate these challenges, their efficiency is offset by the need for substantial manual effort. To address these constraints, we introduce Multiagent Environment-aware semi-Automated Guide (MEAG), a novel framework that leverages widely known, highly efficient, and low-resolution single-agent pathfinding algorithms for shaping rewards to guide multiagent reinforcement learning agents. MEAG uses these single-agent solvers over a coarse-grid surrogate that requires minimal manual intervention, and guides agents away from random exploration in a manner that significantly reduces computational costs. When tested across a range of densely and sparsely connected multiagent navigation environments, MEAG consistently outperforms state-of-the-art algorithms, achieving up to <span><math><mn>50</mn><mspace></mspace><mi>%</mi></math></span> faster convergence and <span><math><mn>20</mn><mspace></mspace><mi>%</mi></math></span> higher rewards. These improvements enable the consideration of MARL for more complex real-world pathfinding applications ranging from warehouse automation to search and rescue operations, and swarm robotics.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131654"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225023264","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The practical applicability of multiagent reinforcement learning is hindered by its low sample efficiency and slow learning speed. While reward shaping and expert guidance can partially mitigate these challenges, their efficiency is offset by the need for substantial manual effort. To address these constraints, we introduce Multiagent Environment-aware semi-Automated Guide (MEAG), a novel framework that leverages widely known, highly efficient, and low-resolution single-agent pathfinding algorithms for shaping rewards to guide multiagent reinforcement learning agents. MEAG uses these single-agent solvers over a coarse-grid surrogate that requires minimal manual intervention, and guides agents away from random exploration in a manner that significantly reduces computational costs. When tested across a range of densely and sparsely connected multiagent navigation environments, MEAG consistently outperforms state-of-the-art algorithms, achieving up to 50% faster convergence and 20% higher rewards. These improvements enable the consideration of MARL for more complex real-world pathfinding applications ranging from warehouse automation to search and rescue operations, and swarm robotics.
导航中多智能体强化学习的算法设计奖励塑造
多智能体强化学习的样本效率低、学习速度慢,阻碍了多智能体强化学习的实际应用。虽然奖励形成和专家指导可以部分缓解这些挑战,但它们的效率被大量人工工作的需求所抵消。为了解决这些限制,我们引入了多智能体环境感知半自动指南(MEAG),这是一个新颖的框架,利用众所周知的、高效的、低分辨率的单智能体寻路算法来形成奖励,以指导多智能体强化学习智能体。MEAG在粗网格代理上使用这些单代理求解器,需要最少的人工干预,并以显著降低计算成本的方式引导代理远离随机探索。在一系列密集和稀疏连接的多智能体导航环境中进行测试时,MEAG始终优于最先进的算法,实现了高达50%的收敛速度和20%的高回报。这些改进使MARL能够用于更复杂的现实世界寻路应用,从仓库自动化到搜索和救援行动,以及群体机器人。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信