FA-MADT: Enhancing Offline Multiagent Reinforcement Learning With Factorized Attention and Decision Transformers

IEEE transactions on artificial intelligence Pub Date : 2026-03-01 Epub Date: 2025-10-20 DOI:10.1109/TAI.2025.3623619

Youness Boutyour;Abdellah Idrissi

{"title":"FA-MADT: Enhancing Offline Multiagent Reinforcement Learning With Factorized Attention and Decision Transformers","authors":"Youness Boutyour;Abdellah Idrissi","doi":"10.1109/TAI.2025.3623619","DOIUrl":null,"url":null,"abstract":"Multiagent reinforcement learning (MARL) is a challenging issue in respect of scalability, coordination, and stability, particularly in the offline setting where exploration is restricted. Decision transformers (DTs) are an emerging technology in offline reinforcement learning (RL) for single agents by transforming RL into a sequence modeling problem, but their use in multiagent environments is not fully explored. In this work, we introduce factorized attention for multiagent decision transformers (FA-MADTs), new architecture that enhances coordination and sample efficiency, with design considerations aimed at improving scalability. FA-MADT uses factorized attention (FA) to model interagent dependencies and thus avoids the quadratic complexity of standard self-attention while preserving relevant coordination information. With the integration of return-to-go (RTG) conditioning, FA-MADT is capable of making trajectory-based decisions and thus performs well in long-term planning without the need for online exploration. Furthermore, behavior cloning (BC) regularization improves policy learning by preventing out-of-distribution (OOD) actions and enhancing the generality of the policy over different offline datasets. We evaluate FA-MADT on three benchmark suites—multiagent MuJoCo, the StarCraft Multiagent Challenge (SMAC), and multiagent traffic signal control—demonstrating consistent improvements over state-of-the-art baselines including MADT, TransMix, CQL-MA, and OMIGA. Our method improves coordination efficiency by up to 15%, reduces OOD action rates by 20%, and lowers memory usage by 12%. FA-MADT also reduces attention complexity from <inline-formula><tex-math>$\\mathcal{O}(N^{2})$</tex-math></inline-formula> to <inline-formula><tex-math>$\\mathcal{O}(N\\cdot d\\cdot k)$</tex-math></inline-formula> with <inline-formula><tex-math>$k\\ll N$</tex-math></inline-formula>, supporting scalable policy learning. Additionally, BC regularization improves OOD action selection accuracy by up to 9.4% on the most challenging SMAC scenarios, contributing to more stable offline policy optimization. These results highlight FA-MADT as a promising step toward scalable and generalizable offline multiagent decision-making, with future work needed to validate its robustness in real-world systems involving noisy sensors and physical dynamics.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 5","pages":"2751-2760"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11207725/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/20 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multiagent reinforcement learning (MARL) is a challenging issue in respect of scalability, coordination, and stability, particularly in the offline setting where exploration is restricted. Decision transformers (DTs) are an emerging technology in offline reinforcement learning (RL) for single agents by transforming RL into a sequence modeling problem, but their use in multiagent environments is not fully explored. In this work, we introduce factorized attention for multiagent decision transformers (FA-MADTs), new architecture that enhances coordination and sample efficiency, with design considerations aimed at improving scalability. FA-MADT uses factorized attention (FA) to model interagent dependencies and thus avoids the quadratic complexity of standard self-attention while preserving relevant coordination information. With the integration of return-to-go (RTG) conditioning, FA-MADT is capable of making trajectory-based decisions and thus performs well in long-term planning without the need for online exploration. Furthermore, behavior cloning (BC) regularization improves policy learning by preventing out-of-distribution (OOD) actions and enhancing the generality of the policy over different offline datasets. We evaluate FA-MADT on three benchmark suites—multiagent MuJoCo, the StarCraft Multiagent Challenge (SMAC), and multiagent traffic signal control—demonstrating consistent improvements over state-of-the-art baselines including MADT, TransMix, CQL-MA, and OMIGA. Our method improves coordination efficiency by up to 15%, reduces OOD action rates by 20%, and lowers memory usage by 12%. FA-MADT also reduces attention complexity from

$\mathcal{O}(N^{2})$

$\mathcal{O}(N\cdot d\cdot k)$

with

$k\ll N$

, supporting scalable policy learning. Additionally, BC regularization improves OOD action selection accuracy by up to 9.4% on the most challenging SMAC scenarios, contributing to more stable offline policy optimization. These results highlight FA-MADT as a promising step toward scalable and generalizable offline multiagent decision-making, with future work needed to validate its robustness in real-world systems involving noisy sensors and physical dynamics.

查看原文本刊更多论文

FA-MADT：利用因式注意和决策转换器增强离线多智能体强化学习

多智能体强化学习（MARL）在可扩展性、协调性和稳定性方面是一个具有挑战性的问题，特别是在探索受限的离线环境中。决策转换器（dt）是一种新兴的单智能体离线强化学习（RL）技术，它将RL转化为序列建模问题，但其在多智能体环境中的应用尚未得到充分探索。在这项工作中，我们为多智能体决策转换器（FA-MADTs）引入了分解注意力，这是一种增强协调和样本效率的新架构，其设计考虑旨在提高可扩展性。FA- madt采用因子化注意力（FA）对智能体间依赖关系进行建模，在保留相关协调信息的同时避免了标准自注意的二次复杂度。FA-MADT集成了RTG （return-to-go）调节功能，能够根据轨迹做出决策，因此无需在线勘探就能很好地进行长期规划。此外，行为克隆（BC）正则化通过防止分布外（OOD）行为和增强策略在不同离线数据集上的通用性来改进策略学习。我们在三个基准套件（多智能体MuJoCo、星际争霸多智能体挑战（SMAC）和多智能体交通信号控制）上对FA-MADT进行了评估，结果表明，与MADT、TransMix、CQL-MA和OMIGA等最先进的基准相比，FA-MADT有了持续的改进。我们的方法将协调效率提高了15%，将OOD动作率降低了20%，并将内存使用降低了12%。FA-MADT还将注意力复杂度从$\mathcal{O}(N^{2})$降低到$\mathcal{O}(N\cdot d\cdot k)$，使用$k\ll N$，支持可扩展的策略学习。此外，在最具挑战性的SMAC场景中，BC正则化将OOD动作选择准确率提高了9.4%，有助于更稳定的离线策略优化。这些结果强调FA-MADT是向可扩展和可推广的离线多智能体决策迈出的有希望的一步，未来的工作需要验证其在涉及噪声传感器和物理动力学的现实系统中的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量