Jie Li, Runfeng Chen, Chang Wang, Yiting Chen, Yuchong Huang, Xiangke Wang
{"title":"A performance-impact based multi-task distributed scheduling algorithm with task removal inference and deadlock avoidance","authors":"Jie Li, Runfeng Chen, Chang Wang, Yiting Chen, Yuchong Huang, Xiangke Wang","doi":"10.1007/s10458-023-09611-y","DOIUrl":"10.1007/s10458-023-09611-y","url":null,"abstract":"<div><p>Multi-task distributed scheduling (MTDS) remains a challenging problem for multi-agent systems used for uncertain and dynamic real-world tasks such as search-and-rescue. The Performance Impact (PI) algorithm is an excellent solution for MTDS, but it suffers from the problem of non-convergence that it may fall into an infinite cycle of exchanging the same task. In this paper, we improve the PI algorithm through the integration of a task removal inference strategy and a deadlock avoidance mechanism. Specifically, the task removal inference strategy results in better exploration performance than the original PI, improving the suboptimal solutions caused by the heuristics for local task selection as done in PI. In addition, we design a deadlock avoidance mechanism that limits the number of times of removing the same task and isolating consecutive inclusions of the same task. Therefore, it guarantees the convergence of the MTDS algorithm. We demonstrate the advantage of the proposed algorithm over the original PI algorithm through Monte Carlo simulation of the search-and-rescue task. The results show that the proposed algorithm can obtain a lower average time cost and the highest total allocation number.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09611-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46459090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-chaotic limit sets in multi-agent learning","authors":"Aleksander Czechowski, Georgios Piliouras","doi":"10.1007/s10458-023-09612-x","DOIUrl":"10.1007/s10458-023-09612-x","url":null,"abstract":"<div><p>Non-convergence is an inherent aspect of adaptive multi-agent systems, and even basic learning models, such as the replicator dynamics, are not guaranteed to equilibriate. Limit cycles, and even more complicated chaotic sets are in fact possible even in rather simple games, including variants of the Rock-Paper-Scissors game. A key challenge of multi-agent learning theory lies in characterization of these limit sets, based on qualitative features of the underlying game. Although chaotic behavior in learning dynamics can be precluded by the celebrated Poincaré–Bendixson theorem, it is only applicable directly to low-dimensional settings. In this work, we attempt to find other characteristics of a game that can force regularity in the limit sets of learning. We show that behavior consistent with the Poincaré–Bendixson theorem (limit cycles, but no chaotic attractor) follows purely from the topological structure of interactions, even for high-dimensional settings with an arbitrary number of players, and arbitrary payoff matrices. We prove our result for a wide class of follow-the-regularized leader (FoReL) dynamics, which generalize replicator dynamics, for binary games characterized interaction graphs where the payoffs of each player are only affected by one other player (i.e., interaction graphs of indegree one). Moreover, for cyclic games we provide further insight into the planar structure of limit sets, and in particular limit cycles. We propose simple conditions under which learning comes with efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a Nash equilibrium, thereby connecting the topology of the dynamics to social-welfare analysis.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45822746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameterized complexity of multiwinner determination: more effort towards fixed-parameter tractability","authors":"Yongjie Yang, Jianxin Wang","doi":"10.1007/s10458-023-09610-z","DOIUrl":"10.1007/s10458-023-09610-z","url":null,"abstract":"<div><p>We study the parameterized complexity of winner determination problems for three prevalent <i>k</i>-committee selection rules, namely the minimax approval voting (MAV), the proportional approval voting (PAV), and the Chamberlin–Courant’s approval voting (CCAV). It is known that these problems are computationally hard. Although they have been studied from the parameterized complexity point of view with respect to several natural parameters, many of them turned out to be <span>W[1]</span>-hard or <span>W[2]</span>-hard. Aiming at obtaining plentiful fixed-parameter algorithms, we revisit these problems by considering more natural single parameters, combined parameters, and structural parameters.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09610-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45276635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Agiollo, Andrea Rafanelli, Matteo Magnini, Giovanni Ciatto, Andrea Omicini
{"title":"Symbolic knowledge injection meets intelligent agents: QoS metrics and experiments","authors":"Andrea Agiollo, Andrea Rafanelli, Matteo Magnini, Giovanni Ciatto, Andrea Omicini","doi":"10.1007/s10458-023-09609-6","DOIUrl":"10.1007/s10458-023-09609-6","url":null,"abstract":"<div><p>Bridging intelligent symbolic agents and sub-symbolic predictors is a long-standing research goal in AI. Among the recent integration efforts, symbolic knowledge injection (SKI) proposes algorithms aimed at steering sub-symbolic predictors’ learning towards compliance w.r.t. pre-existing symbolic knowledge bases. However, state-of-the-art contributions about SKI mostly tackle injection from a foundational perspective, often focussing solely on improving the predictive performance of the sub-symbolic predictors undergoing injection. Technical contributions, in turn, are tailored on individual methods/experiments and therefore poorly interoperable with agent technologies as well as among each others. Intelligent agents may exploit SKI to serve many purposes other than predictive performance alone—provided that, of course, adequate technological support exists: for instance, SKI may allow agents to tune computational, energetic, or data requirements of sub-symbolic predictors. Given that different algorithms may exist to serve all those many purposes, some criteria for <i>algorithm selection</i> as well as a suitable <i>technology</i> should be available to let agents dynamically select and exploit the most suitable algorithm for the problem at hand. Along this line, in this work we design a set of <i>quality-of-service</i> (QoS) <i>metrics</i> for SKI, and a <i>general-purpose software API</i> to enable their application to various SKI algorithms—namely, platform for symbolic knowledge injection (PSyKI). We provide an abstract formulation of four QoS metrics for SKI, and describe the design of PSyKI according to a software engineering perspective. Then we discuss how our QoS metrics are supported by PSyKI. Finally, we demonstrate the effectiveness of both our QoS metrics and PSyKI via a number of experiments, where SKI is both applied and assessed via our proposed API. Our empirical analysis demonstrates both the soundness of our proposed metrics and the versatility of PSyKI as the first software tool supporting the application, interchange, and numerical assessment of SKI techniques. To the best of our knowledge, our proposals represent the first attempt to introduce QoS metrics for SKI, and the software tools enabling their <i>practical</i> exploitation for both human and computational agents. In particular, our contributions could be exploited to automate and/or compare the manifold SKI algorithms from the state of the art. Hence moving a concrete step forward the engineering of efficient, robust, and trustworthy software applications that integrate symbolic agents and sub-symbolic predictors.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09609-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48725447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilir Kola, Catholijn M. Jonker, M. Birna van Riemsdijk
{"title":"Using psychological characteristics of situations for social situation comprehension in support agents","authors":"Ilir Kola, Catholijn M. Jonker, M. Birna van Riemsdijk","doi":"10.1007/s10458-023-09605-w","DOIUrl":"10.1007/s10458-023-09605-w","url":null,"abstract":"<div><p>Support agents that help users in their daily lives need to take into account not only the user’s characteristics, but also the social situation of the user. Existing work on including social context uses some type of situation cue as an input to information processing techniques in order to assess the expected behavior of the user. However, research shows that it is important to also determine the <i>meaning</i> of a situation, a step which we refer to as social situation comprehension. We propose using psychological characteristics of situations, which have been proposed in social science for ascribing meaning to situations, as the basis for social situation comprehension. Using data from user studies, we evaluate this proposal from two perspectives. First, from a technical perspective, we show that psychological characteristics of situations can be used as input to predict the priority of social situations, and that psychological characteristics of situations can be predicted from the features of a social situation. Second, we investigate the role of the comprehension step in human–machine meaning making. We show that psychological characteristics can be successfully used as a basis for explanations given to users about the decisions of an agenda management personal assistant agent.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09605-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42041729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathieu Reymond, Conor F. Hayes, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé
{"title":"Actor-critic multi-objective reinforcement learning for non-linear utility functions","authors":"Mathieu Reymond, Conor F. Hayes, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé","doi":"10.1007/s10458-023-09604-x","DOIUrl":"10.1007/s10458-023-09604-x","url":null,"abstract":"<div><p>We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.\u0000</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45209132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion
{"title":"Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning","authors":"Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion","doi":"10.1007/s10458-022-09596-0","DOIUrl":"10.1007/s10458-022-09596-0","url":null,"abstract":"<div><p>In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns–known in reinforcement learning as the value–cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-022-09596-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48259177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shi Yuan Tang, Athirai A. Irissappane, Frans A. Oliehoek, Jie Zhang
{"title":"Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning","authors":"Shi Yuan Tang, Athirai A. Irissappane, Frans A. Oliehoek, Jie Zhang","doi":"10.1007/s10458-023-09606-9","DOIUrl":"10.1007/s10458-023-09606-9","url":null,"abstract":"<div><p>Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper-parameter tuning. Motivated by the capability of Generative Adversarial Networks (GANs) in learning complex data manifold, the adversarial training procedure could be utilized to learn a population of good-performing policies instead. We extend the teacher-student methodology observed in the Knowledge Distillation field in typical deep neural network prediction tasks to RL paradigm. Instead of learning a single compressed student network, an adversarially-trained generative model (hypernetwork) is learned to output network weights of a population of good-performing policy networks, representing a school of apprentices. Our proposed framework, named Teacher-Apprentices RL (TARL), is modular and could be used in conjunction with many existing RL algorithms. We illustrate the performance gain and improved robustness by combining TARL with various types of RL algorithms, including direct policy search Cross-Entropy Method, Q-learning, Actor-Critic, and policy gradient-based methods.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42627522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Schwind, Emir Demirović, Katsumi Inoue, Jean-Marie Lagniez
{"title":"Algorithms for partially robust team formation","authors":"Nicolas Schwind, Emir Demirović, Katsumi Inoue, Jean-Marie Lagniez","doi":"10.1007/s10458-023-09608-7","DOIUrl":"10.1007/s10458-023-09608-7","url":null,"abstract":"<div><p>In one of its simplest forms, Team Formation involves deploying the least expensive team of agents while covering a set of skills. While current algorithms are reasonably successful in computing the best teams, the resilience to change of such solutions remains an important concern: Once a team has been formed, some of the agents considered at start may be finally defective and some skills may become uncovered. Two recently introduced solution concepts deal with this issue proactively: 1) form a team which is robust to changes so that after some agent losses, all skills remain covered, and 2) opt for a recoverable team, i.e., it can be \"repaired\" in the worst case by hiring new agents while keeping the overall deployment cost minimal. In this paper, we introduce the problem of <i>partially robust team formation</i> (PR–TF). Partial robustness is a weaker form of robustness which guarantees a certain degree of skill coverage after some agents are lost. We analyze the computational complexity of PR-TF and provide two complete algorithms for it. We compare the performance of our algorithms with the existing methods for robust and recoverable team formation on several existing and newly introduced benchmarks. Our empirical study demonstrates that partial robustness offers an interesting trade-off between (full) robustness and recoverability in terms of computational efficiency, skill coverage guaranteed after agent losses and repairability. This paper is an extended and revised version of as reported by (Schwind et al., Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’21), pp. 1154–1162, 2021).</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 2","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48970838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andries Smit, Herman A. Engelbrecht, Willie Brink, Arnu Pretorius
{"title":"Scaling multi-agent reinforcement learning to full 11 versus 11 simulated robotic football","authors":"Andries Smit, Herman A. Engelbrecht, Willie Brink, Arnu Pretorius","doi":"10.1007/s10458-023-09603-y","DOIUrl":"10.1007/s10458-023-09603-y","url":null,"abstract":"<div><p>Robotic football has long been seen as a grand challenge in artificial intelligence. Despite recent success of learned policies over heuristics and handcrafted rules in general, current teams in the simulated RoboCup football leagues, where autonomous agents compete against each other, still rely on handcrafted strategies with only a few using reinforcement learning directly. This limits a learning agent’s ability to find stronger high-level strategies for the full game. In this paper, we show that it is possible for agents to learn competent football strategies on a full 22 player setting using limited computation resources (one GPU and one CPU), from tabula rasa through self-play. To do this, we build a 2D football simulator with faster simulation times than the RoboCup simulator. We propose various improvements to the standard single-agent PPO training algorithm which help it scale to our multi-agent setting. These improvements include (1) using a policy and critic network with an attention mechanism that scales linearly in the number of agents, (2) sharing networks between agents which allow for faster throughput using batching, and (3) using Polyak averaged opponents, league opponents and freezing the opponent team when necessary. We show through experimental results that stable training in the full 22 player setting is possible. Agents trained in the 22 player setting learn to defeat a variety of handcrafted strategies, and also achieve a higher win rate compared to agents trained in the 4 player setting and evaluated in the full game.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09603-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47091125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}