{"title":"FA-MADT: Enhancing Offline Multiagent Reinforcement Learning With Factorized Attention and Decision Transformers","authors":"Youness Boutyour;Abdellah Idrissi","doi":"10.1109/TAI.2025.3623619","DOIUrl":null,"url":null,"abstract":"Multiagent reinforcement learning (MARL) is a challenging issue in respect of scalability, coordination, and stability, particularly in the offline setting where exploration is restricted. Decision transformers (DTs) are an emerging technology in offline reinforcement learning (RL) for single agents by transforming RL into a sequence modeling problem, but their use in multiagent environments is not fully explored. In this work, we introduce factorized attention for multiagent decision transformers (FA-MADTs), new architecture that enhances coordination and sample efficiency, with design considerations aimed at improving scalability. FA-MADT uses factorized attention (FA) to model interagent dependencies and thus avoids the quadratic complexity of standard self-attention while preserving relevant coordination information. With the integration of return-to-go (RTG) conditioning, FA-MADT is capable of making trajectory-based decisions and thus performs well in long-term planning without the need for online exploration. Furthermore, behavior cloning (BC) regularization improves policy learning by preventing out-of-distribution (OOD) actions and enhancing the generality of the policy over different offline datasets. We evaluate FA-MADT on three benchmark suites—multiagent MuJoCo, the StarCraft Multiagent Challenge (SMAC), and multiagent traffic signal control—demonstrating consistent improvements over state-of-the-art baselines including MADT, TransMix, CQL-MA, and OMIGA. Our method improves coordination efficiency by up to 15%, reduces OOD action rates by 20%, and lowers memory usage by 12%. FA-MADT also reduces attention complexity from <inline-formula><tex-math>$\\mathcal{O}(N^{2})$</tex-math></inline-formula> to <inline-formula><tex-math>$\\mathcal{O}(N\\cdot d\\cdot k)$</tex-math></inline-formula> with <inline-formula><tex-math>$k\\ll N$</tex-math></inline-formula>, supporting scalable policy learning. Additionally, BC regularization improves OOD action selection accuracy by up to 9.4% on the most challenging SMAC scenarios, contributing to more stable offline policy optimization. These results highlight FA-MADT as a promising step toward scalable and generalizable offline multiagent decision-making, with future work needed to validate its robustness in real-world systems involving noisy sensors and physical dynamics.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 5","pages":"2751-2760"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11207725/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/20 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multiagent reinforcement learning (MARL) is a challenging issue in respect of scalability, coordination, and stability, particularly in the offline setting where exploration is restricted. Decision transformers (DTs) are an emerging technology in offline reinforcement learning (RL) for single agents by transforming RL into a sequence modeling problem, but their use in multiagent environments is not fully explored. In this work, we introduce factorized attention for multiagent decision transformers (FA-MADTs), new architecture that enhances coordination and sample efficiency, with design considerations aimed at improving scalability. FA-MADT uses factorized attention (FA) to model interagent dependencies and thus avoids the quadratic complexity of standard self-attention while preserving relevant coordination information. With the integration of return-to-go (RTG) conditioning, FA-MADT is capable of making trajectory-based decisions and thus performs well in long-term planning without the need for online exploration. Furthermore, behavior cloning (BC) regularization improves policy learning by preventing out-of-distribution (OOD) actions and enhancing the generality of the policy over different offline datasets. We evaluate FA-MADT on three benchmark suites—multiagent MuJoCo, the StarCraft Multiagent Challenge (SMAC), and multiagent traffic signal control—demonstrating consistent improvements over state-of-the-art baselines including MADT, TransMix, CQL-MA, and OMIGA. Our method improves coordination efficiency by up to 15%, reduces OOD action rates by 20%, and lowers memory usage by 12%. FA-MADT also reduces attention complexity from $\mathcal{O}(N^{2})$ to $\mathcal{O}(N\cdot d\cdot k)$ with $k\ll N$, supporting scalable policy learning. Additionally, BC regularization improves OOD action selection accuracy by up to 9.4% on the most challenging SMAC scenarios, contributing to more stable offline policy optimization. These results highlight FA-MADT as a promising step toward scalable and generalizable offline multiagent decision-making, with future work needed to validate its robustness in real-world systems involving noisy sensors and physical dynamics.