Autonomous Agents and Multi-Agent Systems最新文献

筛选
英文 中文
A performance-impact based multi-task distributed scheduling algorithm with task removal inference and deadlock avoidance 一种基于性能影响的多任务分布式调度算法
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-07-18 DOI: 10.1007/s10458-023-09611-y
Jie Li, Runfeng Chen, Chang Wang, Yiting Chen, Yuchong Huang, Xiangke Wang
{"title":"A performance-impact based multi-task distributed scheduling algorithm with task removal inference and deadlock avoidance","authors":"Jie Li,&nbsp;Runfeng Chen,&nbsp;Chang Wang,&nbsp;Yiting Chen,&nbsp;Yuchong Huang,&nbsp;Xiangke Wang","doi":"10.1007/s10458-023-09611-y","DOIUrl":"10.1007/s10458-023-09611-y","url":null,"abstract":"<div><p>Multi-task distributed scheduling (MTDS) remains a challenging problem for multi-agent systems used for uncertain and dynamic real-world tasks such as search-and-rescue. The Performance Impact (PI) algorithm is an excellent solution for MTDS, but it suffers from the problem of non-convergence that it may fall into an infinite cycle of exchanging the same task. In this paper, we improve the PI algorithm through the integration of a task removal inference strategy and a deadlock avoidance mechanism. Specifically, the task removal inference strategy results in better exploration performance than the original PI, improving the suboptimal solutions caused by the heuristics for local task selection as done in PI. In addition, we design a deadlock avoidance mechanism that limits the number of times of removing the same task and isolating consecutive inclusions of the same task. Therefore, it guarantees the convergence of the MTDS algorithm. We demonstrate the advantage of the proposed algorithm over the original PI algorithm through Monte Carlo simulation of the search-and-rescue task. The results show that the proposed algorithm can obtain a lower average time cost and the highest total allocation number.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09611-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46459090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Non-chaotic limit sets in multi-agent learning 多智能体学习中的非混沌极限集
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-07-13 DOI: 10.1007/s10458-023-09612-x
Aleksander Czechowski, Georgios Piliouras
{"title":"Non-chaotic limit sets in multi-agent learning","authors":"Aleksander Czechowski,&nbsp;Georgios Piliouras","doi":"10.1007/s10458-023-09612-x","DOIUrl":"10.1007/s10458-023-09612-x","url":null,"abstract":"<div><p>Non-convergence is an inherent aspect of adaptive multi-agent systems, and even basic learning models, such as the replicator dynamics, are not guaranteed to equilibriate. Limit cycles, and even more complicated chaotic sets are in fact possible even in rather simple games, including variants of the Rock-Paper-Scissors game. A key challenge of multi-agent learning theory lies in characterization of these limit sets, based on qualitative features of the underlying game. Although chaotic behavior in learning dynamics can be precluded by the celebrated Poincaré–Bendixson theorem, it is only applicable directly to low-dimensional settings. In this work, we attempt to find other characteristics of a game that can force regularity in the limit sets of learning. We show that behavior consistent with the Poincaré–Bendixson theorem (limit cycles, but no chaotic attractor) follows purely from the topological structure of interactions, even for high-dimensional settings with an arbitrary number of players, and arbitrary payoff matrices. We prove our result for a wide class of follow-the-regularized leader (FoReL) dynamics, which generalize replicator dynamics, for binary games characterized interaction graphs where the payoffs of each player are only affected by one other player (i.e., interaction graphs of indegree one). Moreover, for cyclic games we provide further insight into the planar structure of limit sets, and in particular limit cycles. We propose simple conditions under which learning comes with efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a Nash equilibrium, thereby connecting the topology of the dynamics to social-welfare analysis.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45822746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parameterized complexity of multiwinner determination: more effort towards fixed-parameter tractability 多赢家决策的参数化复杂性:更多的精力放在固定参数的可追溯性上
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-06-30 DOI: 10.1007/s10458-023-09610-z
Yongjie Yang, Jianxin Wang
{"title":"Parameterized complexity of multiwinner determination: more effort towards fixed-parameter tractability","authors":"Yongjie Yang,&nbsp;Jianxin Wang","doi":"10.1007/s10458-023-09610-z","DOIUrl":"10.1007/s10458-023-09610-z","url":null,"abstract":"<div><p>We study the parameterized complexity of winner determination problems for three prevalent <i>k</i>-committee selection rules, namely the minimax approval voting (MAV), the proportional approval voting (PAV), and the Chamberlin–Courant’s approval voting (CCAV). It is known that these problems are computationally hard. Although they have been studied from the parameterized complexity point of view with respect to several natural parameters, many of them turned out to be <span>W[1]</span>-hard or <span>W[2]</span>-hard. Aiming at obtaining plentiful fixed-parameter algorithms, we revisit these problems by considering more natural single parameters, combined parameters, and structural parameters.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09610-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45276635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Symbolic knowledge injection meets intelligent agents: QoS metrics and experiments 符号知识注入与智能代理:QoS度量和实验
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-06-23 DOI: 10.1007/s10458-023-09609-6
Andrea Agiollo, Andrea Rafanelli, Matteo Magnini, Giovanni Ciatto, Andrea Omicini
{"title":"Symbolic knowledge injection meets intelligent agents: QoS metrics and experiments","authors":"Andrea Agiollo,&nbsp;Andrea Rafanelli,&nbsp;Matteo Magnini,&nbsp;Giovanni Ciatto,&nbsp;Andrea Omicini","doi":"10.1007/s10458-023-09609-6","DOIUrl":"10.1007/s10458-023-09609-6","url":null,"abstract":"<div><p>Bridging intelligent symbolic agents and sub-symbolic predictors is a long-standing research goal in AI. Among the recent integration efforts, symbolic knowledge injection (SKI) proposes algorithms aimed at steering sub-symbolic predictors’ learning towards compliance w.r.t. pre-existing symbolic knowledge bases. However, state-of-the-art contributions about SKI mostly tackle injection from a foundational perspective, often focussing solely on improving the predictive performance of the sub-symbolic predictors undergoing injection. Technical contributions, in turn, are tailored on individual methods/experiments and therefore poorly interoperable with agent technologies as well as among each others. Intelligent agents may exploit SKI to serve many purposes other than predictive performance alone—provided that, of course, adequate technological support exists: for instance, SKI may allow agents to tune computational, energetic, or data requirements of sub-symbolic predictors. Given that different algorithms may exist to serve all those many purposes, some criteria for <i>algorithm selection</i> as well as a suitable <i>technology</i> should be available to let agents dynamically select and exploit the most suitable algorithm for the problem at hand. Along this line, in this work we design a set of <i>quality-of-service</i> (QoS) <i>metrics</i> for SKI, and a <i>general-purpose software API</i> to enable their application to various SKI algorithms—namely, platform for symbolic knowledge injection (PSyKI). We provide an abstract formulation of four QoS metrics for SKI, and describe the design of PSyKI according to a software engineering perspective. Then we discuss how our QoS metrics are supported by PSyKI. Finally, we demonstrate the effectiveness of both our QoS metrics and PSyKI via a number of experiments, where SKI is both applied and assessed via our proposed API. Our empirical analysis demonstrates both the soundness of our proposed metrics and the versatility of PSyKI as the first software tool supporting the application, interchange, and numerical assessment of SKI techniques. To the best of our knowledge, our proposals represent the first attempt to introduce QoS metrics for SKI, and the software tools enabling their <i>practical</i> exploitation for both human and computational agents. In particular, our contributions could be exploited to automate and/or compare the manifold SKI algorithms from the state of the art. Hence moving a concrete step forward the engineering of efficient, robust, and trustworthy software applications that integrate symbolic agents and sub-symbolic predictors.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09609-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48725447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using psychological characteristics of situations for social situation comprehension in support agents 利用情境的心理特征理解支持主体的社会情境
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-04-28 DOI: 10.1007/s10458-023-09605-w
Ilir Kola, Catholijn M. Jonker, M. Birna van Riemsdijk
{"title":"Using psychological characteristics of situations for social situation comprehension in support agents","authors":"Ilir Kola,&nbsp;Catholijn M. Jonker,&nbsp;M. Birna van Riemsdijk","doi":"10.1007/s10458-023-09605-w","DOIUrl":"10.1007/s10458-023-09605-w","url":null,"abstract":"<div><p>Support agents that help users in their daily lives need to take into account not only the user’s characteristics, but also the social situation of the user. Existing work on including social context uses some type of situation cue as an input to information processing techniques in order to assess the expected behavior of the user. However, research shows that it is important to also determine the <i>meaning</i> of a situation, a step which we refer to as social situation comprehension. We propose using psychological characteristics of situations, which have been proposed in social science for ascribing meaning to situations, as the basis for social situation comprehension. Using data from user studies, we evaluate this proposal from two perspectives. First, from a technical perspective, we show that psychological characteristics of situations can be used as input to predict the priority of social situations, and that psychological characteristics of situations can be predicted from the features of a social situation. Second, we investigate the role of the comprehension step in human–machine meaning making. We show that psychological characteristics can be successfully used as a basis for explanations given to users about the decisions of an agenda management personal assistant agent.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09605-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42041729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Actor-critic multi-objective reinforcement learning for non-linear utility functions 非线性效用函数的行动者-批评家多目标强化学习
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-04-28 DOI: 10.1007/s10458-023-09604-x
Mathieu Reymond, Conor F. Hayes, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé
{"title":"Actor-critic multi-objective reinforcement learning for non-linear utility functions","authors":"Mathieu Reymond,&nbsp;Conor F. Hayes,&nbsp;Denis Steckelmacher,&nbsp;Diederik M. Roijers,&nbsp;Ann Nowé","doi":"10.1007/s10458-023-09604-x","DOIUrl":"10.1007/s10458-023-09604-x","url":null,"abstract":"<div><p>We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.\u0000</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45209132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning 蒙特卡罗树搜索算法的风险意识和多目标强化学习
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-04-28 DOI: 10.1007/s10458-022-09596-0
Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion
{"title":"Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning","authors":"Conor F. Hayes,&nbsp;Mathieu Reymond,&nbsp;Diederik M. Roijers,&nbsp;Enda Howley,&nbsp;Patrick Mannion","doi":"10.1007/s10458-022-09596-0","DOIUrl":"10.1007/s10458-022-09596-0","url":null,"abstract":"<div><p>In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns–known in reinforcement learning as the value–cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-022-09596-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48259177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning 教师-学徒RL(TARL):通过生成对抗性超网络在强化学习中利用复杂的政策分布
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-04-28 DOI: 10.1007/s10458-023-09606-9
Shi Yuan Tang, Athirai A. Irissappane, Frans A. Oliehoek, Jie Zhang
{"title":"Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning","authors":"Shi Yuan Tang,&nbsp;Athirai A. Irissappane,&nbsp;Frans A. Oliehoek,&nbsp;Jie Zhang","doi":"10.1007/s10458-023-09606-9","DOIUrl":"10.1007/s10458-023-09606-9","url":null,"abstract":"<div><p>Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper-parameter tuning. Motivated by the capability of Generative Adversarial Networks (GANs) in learning complex data manifold, the adversarial training procedure could be utilized to learn a population of good-performing policies instead. We extend the teacher-student methodology observed in the Knowledge Distillation field in typical deep neural network prediction tasks to RL paradigm. Instead of learning a single compressed student network, an adversarially-trained generative model (hypernetwork) is learned to output network weights of a population of good-performing policy networks, representing a school of apprentices. Our proposed framework, named Teacher-Apprentices RL (TARL), is modular and could be used in conjunction with many existing RL algorithms. We illustrate the performance gain and improved robustness by combining TARL with various types of RL algorithms, including direct policy search Cross-Entropy Method, Q-learning, Actor-Critic, and policy gradient-based methods.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42627522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for partially robust team formation 部分鲁棒组队算法
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-04-25 DOI: 10.1007/s10458-023-09608-7
Nicolas Schwind, Emir Demirović, Katsumi Inoue, Jean-Marie Lagniez
{"title":"Algorithms for partially robust team formation","authors":"Nicolas Schwind,&nbsp;Emir Demirović,&nbsp;Katsumi Inoue,&nbsp;Jean-Marie Lagniez","doi":"10.1007/s10458-023-09608-7","DOIUrl":"10.1007/s10458-023-09608-7","url":null,"abstract":"<div><p>In one of its simplest forms, Team Formation involves deploying the least expensive team of agents while covering a set of skills. While current algorithms are reasonably successful in computing the best teams, the resilience to change of such solutions remains an important concern: Once a team has been formed, some of the agents considered at start may be finally defective and some skills may become uncovered. Two recently introduced solution concepts deal with this issue proactively: 1) form a team which is robust to changes so that after some agent losses, all skills remain covered, and 2) opt for a recoverable team, i.e., it can be \"repaired\" in the worst case by hiring new agents while keeping the overall deployment cost minimal. In this paper, we introduce the problem of <i>partially robust team formation</i> (PR–TF). Partial robustness is a weaker form of robustness which guarantees a certain degree of skill coverage after some agents are lost. We analyze the computational complexity of PR-TF and provide two complete algorithms for it. We compare the performance of our algorithms with the existing methods for robust and recoverable team formation on several existing and newly introduced benchmarks. Our empirical study demonstrates that partial robustness offers an interesting trade-off between (full) robustness and recoverability in terms of computational efficiency, skill coverage guaranteed after agent losses and repairability. This paper is an extended and revised version of as reported by (Schwind et al., Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’21), pp. 1154–1162, 2021).</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48970838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scaling multi-agent reinforcement learning to full 11 versus 11 simulated robotic football 将多智能体强化学习扩展到完整的11对11模拟机器人足球
IF 1.9 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2023-03-24 DOI: 10.1007/s10458-023-09603-y
Andries Smit, Herman A. Engelbrecht, Willie Brink, Arnu Pretorius
{"title":"Scaling multi-agent reinforcement learning to full 11 versus 11 simulated robotic football","authors":"Andries Smit,&nbsp;Herman A. Engelbrecht,&nbsp;Willie Brink,&nbsp;Arnu Pretorius","doi":"10.1007/s10458-023-09603-y","DOIUrl":"10.1007/s10458-023-09603-y","url":null,"abstract":"<div><p>Robotic football has long been seen as a grand challenge in artificial intelligence. Despite recent success of learned policies over heuristics and handcrafted rules in general, current teams in the simulated RoboCup football leagues, where autonomous agents compete against each other, still rely on handcrafted strategies with only a few using reinforcement learning directly. This limits a learning agent’s ability to find stronger high-level strategies for the full game. In this paper, we show that it is possible for agents to learn competent football strategies on a full 22 player setting using limited computation resources (one GPU and one CPU), from tabula rasa through self-play. To do this, we build a 2D football simulator with faster simulation times than the RoboCup simulator. We propose various improvements to the standard single-agent PPO training algorithm which help it scale to our multi-agent setting. These improvements include (1) using a policy and critic network with an attention mechanism that scales linearly in the number of agents, (2) sharing networks between agents which allow for faster throughput using batching, and (3) using Polyak averaged opponents, league opponents and freezing the opponent team when necessary. We show through experimental results that stable training in the full 22 player setting is possible. Agents trained in the 22 player setting learn to defeat a variety of handcrafted strategies, and also achieve a higher win rate compared to agents trained in the 4 player setting and evaluated in the full game.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09603-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47091125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信