基于对比学习的深度强化学习智能体建模

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2025-08-13 DOI:10.1109/TETCI.2025.3595684

Wenhao Ma;Yu-Chen Chang;Jie Yang;Yu-Kai Wang;Chin-Teng Lin

{"title":"基于对比学习的深度强化学习智能体建模","authors":"Wenhao Ma;Yu-Chen Chang;Jie Yang;Yu-Kai Wang;Chin-Teng Lin","doi":"10.1109/TETCI.2025.3595684","DOIUrl":null,"url":null,"abstract":"Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multi-agent systems, as this is the means by which the controlled agent (ego agent) understands other agents' (modeled agents) behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from modeled agents during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a <bold>Contrastive <bold>Learning-based <bold>Agent <bold>Modeling (<bold>CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. The experiment results demonstrate that our approach improves reinforcement learning performance by at least 28% on cooperative and competitive tasks, which exceeds the state-of-the-art.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 5","pages":"3719-3726"},"PeriodicalIF":5.3000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contrastive Learning-Based Agent Modeling for Deep Reinforcement Learning\",\"authors\":\"Wenhao Ma;Yu-Chen Chang;Jie Yang;Yu-Kai Wang;Chin-Teng Lin\",\"doi\":\"10.1109/TETCI.2025.3595684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multi-agent systems, as this is the means by which the controlled agent (ego agent) understands other agents' (modeled agents) behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from modeled agents during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a <bold>Contrastive <bold>Learning-based <bold>Agent <bold>Modeling (<bold>CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. The experiment results demonstrate that our approach improves reinforcement learning performance by at least 28% on cooperative and competitive tasks, which exceeds the state-of-the-art.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":\"9 5\",\"pages\":\"3719-3726\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11123723/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11123723/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多智能体系统通常需要智能体与具有不同目标、行为或策略的其他智能体合作或竞争。在多代理系统中为智能机器代理设计自适应策略时，代理建模是必不可少的，因为这是受控代理（自我代理）理解其他代理（建模代理）行为并提取其有意义的策略表示的手段。这些表征可以用来增强自我智能体通过强化学习训练的自适应策略。然而，现有的智能体建模方法通常假设在训练期间建模的智能体的局部观察的可用性，或者为策略适应提供长观察轨迹。为了消除这些限制性假设并提高智能体建模性能，我们设计了一种基于对比学习的智能体建模（CLAM）方法，该方法仅依赖于自我智能体在训练和执行过程中的局部观察。通过这些观察，CLAM能够从每个事件的开始实时生成一致的高质量策略表示。我们评估了我们的方法在合作和竞争多智能体环境中的有效性。实验结果表明，我们的方法在合作和竞争任务上提高了至少28%的强化学习性能，超过了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Contrastive Learning-Based Agent Modeling for Deep Reinforcement Learning

Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multi-agent systems, as this is the means by which the controlled agent (ego agent) understands other agents' (modeled agents) behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from modeled agents during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. The experiment results demonstrate that our approach improves reinforcement learning performance by at least 28% on cooperative and competitive tasks, which exceeds the state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.