David Rother, Joni Pajarinen, Jan Peters, Thomas H. Weisswange
{"title":"Open-ended coordination for multi-agent systems using modular open policies","authors":"David Rother, Joni Pajarinen, Jan Peters, Thomas H. Weisswange","doi":"10.1007/s10458-025-09723-7","DOIUrl":"10.1007/s10458-025-09723-7","url":null,"abstract":"<div><p>Significant multi-agent advances addressing the challenge of learning policies for acting in ad hoc teamwork have been made. In ad hoc teamwork, a team of agents must cooperate effectively without prior coordination or communication. Many existing approaches, however, struggle to perform well in open environments where the setting can change significantly during deployment. This paper presents a new reinforcement learning approach to tackle collaboration in open environments controlling one agent with a changing number of distinct other agents, each with an individual task. The approach uses policy blending based on an online goal inference module and a collection of learned policies modeling the individual interaction impact between the agent and populations of partners with different tasks. Blending is done using the estimated goals of others and a posterior-based action blending with entropy adjustment and regularization. Our approach addresses issues of existing policy blending mechanisms, such as handling conflicting modes in action distributions leading to oscillation and instability and adapting to uncertain states dynamically. In experiments in two collaborative open environments based on Overcooked and Level-based Foraging, our approach outperforms a baseline learner, trained with the joint reward of all agents, across changes to both agents and tasks. Ablation studies further highlight the importance of our posterior-based blending mechanism to achieve high rewards as well as the provided goal weighting. The proposed approach provides an important step towards the application of reinforcement learning to AI assistance beyond strictly closed worlds and towards more realistic scenarios.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09723-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145256630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the graph theory of majority illusions: theoretical results and computational experiments","authors":"Maaike Venema-Los, Zoé Christoff, Davide Grossi","doi":"10.1007/s10458-025-09720-w","DOIUrl":"10.1007/s10458-025-09720-w","url":null,"abstract":"<div><p>The popularity of an opinion in one’s direct circles is not necessarily a good indicator of its popularity in one’s entire community. Network structures make local information about global properties of the group potentially inaccurate, and the way a social network is wired constrains what kind of information distortion can actually occur. In this paper, we discuss which classes of networks allow for a large enough proportion of the population to get a wrong enough impression about the overall distribution of opinions. We start by focusing on the ‘majority illusion’, the case where one sees a majority opinion in one’s direct circles that differs from the global majority. We show that no network structure can guarantee that most agents see the correct majority. We then perform computational experiments to study the likelihood of majority illusions in different classes of networks. Finally, we generalize to other types of illusions.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09720-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing policies for transition-independent multiagent systems that are robust to communication loss","authors":"Mustafa O. Karabag, Cyrus Neary, Ufuk Topcu","doi":"10.1007/s10458-025-09721-9","DOIUrl":"10.1007/s10458-025-09721-9","url":null,"abstract":"<div><p>In a cooperative multiagent system, a collection of agents executes a joint policy in order to achieve some common objective. The successful deployment of such systems hinges on the availability of reliable inter-agent communication. However, many sources of potential disruption to communication exist in practice, such as radio interference, hardware failure, and adversarial attacks. In this work, we develop joint policies for cooperative multiagent systems that are robust to potential losses in communication. More specifically, we develop joint policies for cooperative Markov games with independent transitions and joint reach-avoid objectives. First, we propose an algorithm for the decentralized execution of joint policies during periods of communication loss. This algorithm is designed to work under arbitrary communication partitions between the agents. Next, we use the total correlation of the state-action process induced by a joint policy as a measure of the intrinsic dependencies between the agents. We then use this measure to lower-bound the performance of a joint policy under randomly intermittent or adversarial communication loss scenarios. We show the existence of a multiagent decision-making environment in which this bound is tight—the highest performance under intermittent communication loss, for any policy execution mechanism, is of the same order as the bound. We then present an algorithm that maximizes a proxy to this lower bound in order to synthesize minimum-dependency joint policies that remain performant under communication loss. Through two-agent and three-agent numerical experiments, we show that the proposed minimum-dependency policies require minimal coordination between the agents while incurring little to no loss in performance; the total correlation value of the synthesized policy is significantly lower than the total correlation value of the baseline policy which does not take potential communication losses into account. As a result, the performance of the minimum-dependency policies remains consistently high regardless of whether or not communication is available. By contrast, the performance of the baseline policy decreases drastically when communication is lost.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144880881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information elicitation mechanisms for Bayesian auctions","authors":"Jing Chen, Bo Li, Yingkai Li","doi":"10.1007/s10458-025-09718-4","DOIUrl":"10.1007/s10458-025-09718-4","url":null,"abstract":"<div><p>In this paper we design information elicitation mechanisms for Bayesian auctions. While in Bayesian mechanism design the distributions of the players’ private types are often assumed to be common knowledge, information elicitation considers the situation where the players know the distributions better than the decision maker. To weaken the information assumption in Bayesian auctions, we consider an information structure where the knowledge about the distributions is <i>arbitrarily scattered</i> among the players. In such an unstructured information setting, we design mechanisms for unit-demand auctions and additive auctions that <i>aggregate</i> the players’ knowledge, generating revenue that are constant approximations to the optimal Bayesian mechanisms with a common prior. Our mechanisms are 2-step dominant-strategy truthful and the approximation ratios improve gracefully with the amount of knowledge the players collectively have.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09718-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145160869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing node selection in search based multi-agent path finding","authors":"Md. Ahasanul Alam, Shekhar Mahmud, Md. Mamun-or-Rashid, Md. Mosaddek Khan","doi":"10.1007/s10458-025-09719-3","DOIUrl":"10.1007/s10458-025-09719-3","url":null,"abstract":"<div><p>The Multi-Agent Path Finding (MAPF) problem involves the task of finding paths for multiple agents that want to reach their destinations without obstructing other agents. Although MAPF is essential for numerous real-world applications, finding an optimal solution to this problem is NP-hard. Many approaches have been proposed in the literature, offering sub-optimal solutions to improve runtime efficiency. <i>Lazy Constraints Addition search for MAPF (LaCAM)</i> is a state-of-the-art sub-optimal MAPF algorithm that employs tree-based lazy successor generation to minimize planning effort. However, the success of the algorithm heavily relies on the effective selection of nodes for expansion. LaCAM employs a fixed heuristic throughout the entire search process, disregarding the agents’ preferences or characteristics of the underlying environment. Nevertheless, experiments with various heuristics indicate that no single heuristic consistently outperforms others across all scenarios. Consequently, in diverse environments, as the number of agents increases, reliance on a single, general heuristic leads to diminished runtime performance. Against this backdrop, with the intent to further speed up the runtime, we propose a novel approach, called eLaCAM, that adaptively selects nodes during the search process considering the current scenario of the environment and agents preferences. We introduce two distinct variants of eLaCAM. The first, eLaCAM-stat, statistically analyses previous results of using different heuristics and selects nodes accordingly. The second variant, eLaCAM-ML, analyze the environment by extracting necessary features to guide a machine learning framework in assisting adaptive node selection during the search process. Our extensive empirical results illustrate a notable improvement in runtime and a reduction in the search space compared to state-of-the-art MAPF algorithms.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145169089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathon Schwartz, Rhys Newbury, Dana Kulić, Hanna Kurniawati
{"title":"POSGGym: a library for decision-theoretic planning and learning in partially observable, multi-agent environments","authors":"Jonathon Schwartz, Rhys Newbury, Dana Kulić, Hanna Kurniawati","doi":"10.1007/s10458-025-09716-6","DOIUrl":"10.1007/s10458-025-09716-6","url":null,"abstract":"<div><p>Seamless integration of Planning Under Uncertainty and Reinforcement Learning (RL) promises to bring the best of both model-driven and data-driven worlds to multi-agent decision-making, resulting in an approach with assurances on performance that scales well to more complex problems. Despite this potential, progress in developing such methods has been hindered by the lack of adequate evaluation and simulation platforms. Researchers have had to rely on creating custom environments, which reduces efficiency and makes comparing new methods difficult. In this paper, we introduce POSGGym : a library for facilitating planning and RL research in partially observable, multi-agent domains. It provides a diverse collection of discrete and continuous environments, complete with their dynamics models and a reference set of policies that can be used to evaluate generalization to novel co-players. Leveraging POSGGym, we empirically investigate existing state-of-the-art planning methods and a method that combines planning and RL in the type-based reasoning setting. Our experiments corroborate that combining planning and RL can yield superior performance compared to planning or RL alone, given the model of the environment and other agents is correct. However, our particular setup also reveals that this integrated approach could result in worse performance when the model of other agents is incorrect. Our findings indicate the benefit of integrating planning and RL in partially observable, multi-agent domains, while serving to highlight several important directions for future research. Code available at: https://github.com/RDLLab/posggym.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09716-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145167220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haris Aziz, Sujit Gujar, Manisha Padala, Mashbat Suzuki, Jeremy Vollen
{"title":"Coordinating monetary contributions in participatory budgeting","authors":"Haris Aziz, Sujit Gujar, Manisha Padala, Mashbat Suzuki, Jeremy Vollen","doi":"10.1007/s10458-025-09715-7","DOIUrl":"10.1007/s10458-025-09715-7","url":null,"abstract":"<div><p>We formalize a framework for coordinating funding and selecting projects, the costs of which are shared among agents with quasi-linear utility functions and individual budgets. Our model contains the discrete participatory budgeting model as a special case, while capturing other useful scenarios. We propose several important axioms and objectives and study how well they can be simultaneously satisfied. We show that whereas welfare maximization admits an FPTAS, welfare maximization subject to a natural and very weak participation requirement leads to a strong inapproximability. This result is bypassed if we consider some natural restricted valuations, namely laminar single-minded valuations and symmetric valuations. Our analysis for the former restriction leads to the discovery of a new class of tractable instances for the Set Union Knapsack problem, a classical problem in combinatorial optimization.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09715-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lata Narayanan, Yasaman Sabbagh, Alexandros A. Voudouris
{"title":"Diversity-seeking jump games in networks","authors":"Lata Narayanan, Yasaman Sabbagh, Alexandros A. Voudouris","doi":"10.1007/s10458-025-09714-8","DOIUrl":"10.1007/s10458-025-09714-8","url":null,"abstract":"<div><p>Recently, strategic games inspired by Schelling’s influential model of residential segregation have been studied in the TCS and AI literature. In these games, agents of <i>k</i> different types occupy the nodes of a network topology aiming to maximize their utility, which is a function of the fraction of same-type agents they are adjacent to in the network. As such, the agents exhibit similarity-seeking strategic behavior. In this paper, we introduce a class of strategic jump games in which the agents are <i>diversity-seeking</i>: The utility of an agent is defined as the fraction of its neighbors that are of <i>different</i> type than itself. We show that in general it is computationally hard to determine the existence of an equilibrium in such games. However, when the network is a tree, diversity-seeking jump games always admit an equilibrium assignment. For regular graphs and spider graphs with a single empty node, we prove a stronger result: The game is potential, that is, the improving response dynamics always converge to an equilibrium from any initial placement of the agents. We also show (nearly tight) bounds on the price of anarchy and price of stability in terms of the social welfare (the total utility of the agents).</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09714-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hans L. Bodlaender, Tesshu Hanaka, Lars Jaffke, Hirotaka Ono, Yota Otachi, Tom C. van der Zanden
{"title":"Hedonic seat arrangement problems","authors":"Hans L. Bodlaender, Tesshu Hanaka, Lars Jaffke, Hirotaka Ono, Yota Otachi, Tom C. van der Zanden","doi":"10.1007/s10458-025-09711-x","DOIUrl":"10.1007/s10458-025-09711-x","url":null,"abstract":"<div><p>In this paper, we study a variant of hedonic games, called <span>Seat Arrangement</span>. The model is defined by a bijection from agents with preferences for each other to vertices in a graph <i>G</i>. The utility of an agent depends on the neighbors assigned in the graph. More precisely, it is the sum over all neighbors of the preferences that the agent has towards the agent assigned to the neighbor. We first consider the price of stability and fairness for different classes of preferences. In particular, we show that there is an instance such that the price of fairness (PoF) is unbounded in general. Moreover, we show an upper bound <span>(tilde{d}(G))</span> and an almost tight lower bound <span>(tilde{d}(G)-1/4)</span> of PoF, where <span>(tilde{d}(G))</span> is the average degree of an input graph. Then we investigate the computational complexity of problems to find certain “good” seat arrangements, say <span>Utilitarian Arrangement</span>, <span>Egalitarian Arrangement</span>, <span>Stable Arrangement</span>, and <span>Envy-free Arrangement</span>. We give dichotomies of computational complexity of four <span>Seat Arrangement</span> problems from the perspective of the maximum order of connected components in an input graph. For the parameterized complexity, <span>Utilitarian Arrangement</span> can be solved in time <span>(n^{O(gamma )})</span>, while it cannot be solved in time <span>(f(gamma )n^{o(gamma )})</span> under ETH, where <i>n</i> is the number of agents and <span>(gamma)</span> is the vertex cover number of an input graph. Moreover, we show that <span>Egalitarian Arrangement</span> and <span>Envy-free Arrangement</span> are weakly NP-hard even on graphs of bounded vertex cover number. Finally, we prove that determining whether a stable arrangement can be obtained from a given arrangement by <i>k</i> swaps is W[1]-hard when parameterized by <span>(k+gamma)</span>, whereas it can be solved in time <span>(n^{O(k)})</span>.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Behavioral QLTL","authors":"Giuseppe De Giacomo, Giuseppe Perelli","doi":"10.1007/s10458-025-09712-w","DOIUrl":"10.1007/s10458-025-09712-w","url":null,"abstract":"<div><p>This paper introduces Behavioral QLTL, a “behavioral” variant of Linear Temporal Logic (<span>ltl</span>) with second-order quantifiers. Behavioral <span>qltl</span> is characterized by the fact that the functions that assign the truth value of the quantified propositions along the trace can only depend on the past. In other words, such functions must be “processes” (Abadi et al., Realizable and Unrealizable Specifications of Reactive Systems, 1989) . This gives the logic a strategic flavor that we usually associate with planning. Indeed we show that temporally extended planning in nondeterministic domains and ltl synthesis are expressed in Behavioral <span>qltl</span> through formulas with a simple quantification alternation. As such alternation increases, we get to forms of planning/synthesis in which contingent and conformant planning aspects get mixed. We study this logic from the computational point of view and compare it to the original <span>qltl</span> (with non-behavioral semantics) and simpler forms of behavioral semantics.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145171806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}