二人零和博弈中的分布式政策空间响应预言

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-04-07 DOI:10.1109/TNNLS.2025.3550827

Hongsong Tang;Yingzhuo Liu;Letian Ni;Liuyu Xiang;Yaodong Yang;Ke Bi;Zhaofeng He

{"title":"二人零和博弈中的分布式政策空间响应预言","authors":"Hongsong Tang;Yingzhuo Liu;Letian Ni;Liuyu Xiang;Yaodong Yang;Ke Bi;Zhaofeng He","doi":"10.1109/TNNLS.2025.3550827","DOIUrl":null,"url":null,"abstract":"Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO solution framework, optimizing policy diversity is crucial for addressing nontransitive game problems, helping the agent population avoid exploitation by unfamiliar opponents. In addition, while deep reinforcement learning is highly effective in solving complex game environments, its integration with PSRO remains fragmented and lacking in effective coordination. In this study, we propose distributed PSRO to efficiently solve complex game scenarios. To enhance diversity while managing optimization costs, we introduce TOP-K truncation, which prioritizes high-quality opponents and limits the size of the policy pool during sampling. This approach not only reduces interference from less effective strategies but also ensures computational efficiency by seamlessly integrating with our distributed training framework. We also design the distributed training framework to incorporate diversity estimation directly into the sampling process, achieving diversity optimization without incurring additional computational overhead. Furthermore, we introduce the opponent first (OF) method, which enhances decision-making by leveraging opponent information during interaction sampling. We perform experimental validation using a nontransitive mixture model and AlphaStar888 to confirm the effectiveness of the TOP-K truncation approach. Finally, we demonstrate the feasibility and efficiency of the distributed training framework and the OF approach in a Google Research Football 11 versus 11 scenario.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"9893-9904"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games\",\"authors\":\"Hongsong Tang;Yingzhuo Liu;Letian Ni;Liuyu Xiang;Yaodong Yang;Ke Bi;Zhaofeng He\",\"doi\":\"10.1109/TNNLS.2025.3550827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO solution framework, optimizing policy diversity is crucial for addressing nontransitive game problems, helping the agent population avoid exploitation by unfamiliar opponents. In addition, while deep reinforcement learning is highly effective in solving complex game environments, its integration with PSRO remains fragmented and lacking in effective coordination. In this study, we propose distributed PSRO to efficiently solve complex game scenarios. To enhance diversity while managing optimization costs, we introduce TOP-K truncation, which prioritizes high-quality opponents and limits the size of the policy pool during sampling. This approach not only reduces interference from less effective strategies but also ensures computational efficiency by seamlessly integrating with our distributed training framework. We also design the distributed training framework to incorporate diversity estimation directly into the sampling process, achieving diversity optimization without incurring additional computational overhead. Furthermore, we introduce the opponent first (OF) method, which enhances decision-making by leveraging opponent information during interaction sampling. We perform experimental validation using a nontransitive mixture model and AlphaStar888 to confirm the effectiveness of the TOP-K truncation approach. Finally, we demonstrate the feasibility and efficiency of the distributed training framework and the OF approach in a Google Research Football 11 versus 11 scenario.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 6\",\"pages\":\"9893-9904\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10950104/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10950104/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

策略空间响应预测（PSRO）是一种基于人口的算法，可用于解决两方零和博弈。在PSRO解决方案框架中，优化策略多样性对于解决非传递博弈问题至关重要，有助于智能体群体避免被不熟悉的对手利用。此外，虽然深度强化学习在解决复杂游戏环境方面非常有效，但它与PSRO的集成仍然是碎片化的，缺乏有效的协调。在本研究中，我们提出了分布式PSRO来有效地解决复杂的游戏场景。为了在管理优化成本的同时增强多样性，我们引入了TOP-K截断，该截断优先考虑高质量的对手，并在采样期间限制策略池的大小。这种方法不仅减少了低效率策略的干扰，而且通过与我们的分布式训练框架无缝集成，确保了计算效率。我们还设计了分布式训练框架，将多样性估计直接纳入采样过程，在不产生额外计算开销的情况下实现多样性优化。此外，我们引入了对手优先（OF）方法，该方法通过在交互采样过程中利用对手信息来增强决策。我们使用非传递混合模型和AlphaStar888进行实验验证，以确认TOP-K截断方法的有效性。最后，我们在谷歌研究橄榄球11对11场景中展示了分布式训练框架和of方法的可行性和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distributed Policy Space Response Oracles in Two-Player Zero-Sum Games

Policy space response oracle (PSRO) is a population-based algorithm that can be used to solve two-player zero-sum games. In the PSRO solution framework, optimizing policy diversity is crucial for addressing nontransitive game problems, helping the agent population avoid exploitation by unfamiliar opponents. In addition, while deep reinforcement learning is highly effective in solving complex game environments, its integration with PSRO remains fragmented and lacking in effective coordination. In this study, we propose distributed PSRO to efficiently solve complex game scenarios. To enhance diversity while managing optimization costs, we introduce TOP-K truncation, which prioritizes high-quality opponents and limits the size of the policy pool during sampling. This approach not only reduces interference from less effective strategies but also ensures computational efficiency by seamlessly integrating with our distributed training framework. We also design the distributed training framework to incorporate diversity estimation directly into the sampling process, achieving diversity optimization without incurring additional computational overhead. Furthermore, we introduce the opponent first (OF) method, which enhances decision-making by leveraging opponent information during interaction sampling. We perform experimental validation using a nontransitive mixture model and AlphaStar888 to confirm the effectiveness of the TOP-K truncation approach. Finally, we demonstrate the feasibility and efficiency of the distributed training framework and the OF approach in a Google Research Football 11 versus 11 scenario.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.