Francois Bredell, Herman A. Engelbrecht, J. C. Schoeman
{"title":"Augmenting the action space with conventions to improve multi-agent cooperation in Hanabi","authors":"Francois Bredell, Herman A. Engelbrecht, J. C. Schoeman","doi":"10.1007/s10458-025-09709-5","DOIUrl":"10.1007/s10458-025-09709-5","url":null,"abstract":"<div><p>The card game <i>Hanabi</i> is considered a strong medium for the testing and development of multi-agent reinforcement learning (MARL) algorithms, due to its cooperative nature, partial observability, limited communication and remarkable complexity. Previous research efforts have explored the capabilities of MARL algorithms within Hanabi, focusing largely on advanced architecture design and algorithmic manipulations to achieve state-of-the-art performance for various number of cooperators. However, this often leads to complex solution strategies with high computational cost and requiring large amounts of training data. For humans to solve the Hanabi game effectively, they require the use of conventions, which often allows for a means to implicitly convey ideas or knowledge based on a predefined, and mutually agreed upon, set of “rules” or principles. Multi-agent problems containing partial observability, especially when limited communication is present, can benefit greatly from the use of implicit knowledge sharing. In this paper, we propose a novel approach to augmenting an agent’s action space using <i>conventions</i>, which act as a sequence of special cooperative actions that span over and include multiple time steps and multiple agents, requiring agents to actively opt in for it to reach fruition. These <i>conventions</i> are based on existing human conventions, and result in a significant improvement on the performance of existing techniques for self-play and cross-play for various number of cooperators within Hanabi.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09709-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144125540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Rother, Franziska Herbert, Fabian Kalter, Dorothea Koert, Joni Pajarinen, Jan Peters, Thomas H. Weisswange
{"title":"Entropy based blending of policies for multi-agent coexistence","authors":"David Rother, Franziska Herbert, Fabian Kalter, Dorothea Koert, Joni Pajarinen, Jan Peters, Thomas H. Weisswange","doi":"10.1007/s10458-025-09707-7","DOIUrl":"10.1007/s10458-025-09707-7","url":null,"abstract":"<div><p>Research on multi-agent interaction involving humans is still in its infancy. Most approaches have focused on environments with collaborative human behavior or a small, defined set of situations. When deploying robots in human-inhabited environments in the future, the diversity of interactions surpasses the capabilities of pre-trained collaboration models. ”Coexistence” environments, characterized by agents with varying or partially aligned objectives, present a unique challenge for robotic collaboration. Traditional reinforcement learning methods fall short in these settings. These approaches lack the flexibility to adapt to changing agent counts or task requirements without undergoing retraining. Moreover, existing models do not adequately support scenarios where robots should exhibit helpful behavior toward others without compromising their primary goals. To tackle this issue, we introduce a novel framework that decomposes interaction and task-solving into separate learning problems and blends the resulting policies at inference time using a goal inference model for task estimation. We create impact-aware agents and linearly scale the cost of training agents with the number of agents and available tasks. To this end, a weighting function blending action distributions for individual interactions with the original task action distribution is proposed. To support our claims we demonstrate that our framework scales in task and agent count across several environments and considers collaboration opportunities when present. The new learning paradigm opens the path to more complex multi-robot, multi-human interactions.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09707-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Course Allocation with Asymmetric Friendships","authors":"Lihi Dery, Tal Grinshpoun, Ilya Khakhiashvili","doi":"10.1007/s10458-025-09708-6","DOIUrl":"10.1007/s10458-025-09708-6","url":null,"abstract":"<div><p>Students’ decisions on whether to take a class are strongly affected by whether their friends plan to take the class with them. A student may prefer to be assigned to a course they like less, just to be with their friends, rather than taking a more preferred class alone. It has been shown that taking classes with friends positively affects academic performance. Thus, academic institutes should prioritize friendship relations when assigning course seats. The introduction of friendship relations results in several non-trivial changes to current course allocation methods. This paper explores how course allocation mechanisms can account for <i>friendships</i> between students and provide a unique, distributed solution. Specifically, we approach the problem by framing it as an asymmetric distributed constraint optimization problem and develop a new dedicated algorithm. Our extensive evaluation includes both simulated data and a study involving 177 students, focusing on their preferences regarding both courses and friendships. The findings indicate that our algorithm achieves significant utility for the students, maintaining fairness in the solution and adhering to the limitations on course seat capacities.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09708-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Epistemic selection of costly alternatives: the case of participatory budgeting","authors":"Simon Rey, Ulle Endriss","doi":"10.1007/s10458-025-09702-y","DOIUrl":"10.1007/s10458-025-09702-y","url":null,"abstract":"","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09702-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maurício Cecílio Magnaguagno, Felipe Meneguzzi, Lavindra de Silva
{"title":"Hypertension and total-order forward decomposition optimizations","authors":"Maurício Cecílio Magnaguagno, Felipe Meneguzzi, Lavindra de Silva","doi":"10.1007/s10458-025-09705-9","DOIUrl":"10.1007/s10458-025-09705-9","url":null,"abstract":"<div><p>Hierarchical Task Network (HTN) planners generate plans using a decomposition process with extra domain knowledge to guide search towards a planning task. Domain experts develop such domain knowledge through recipes of how to decompose higher level tasks, specifying which tasks can be decomposed and under what conditions. In most realistic domains, such recipes contain recursions, i.e., tasks that can be decomposed into other tasks that contain the original task. Such domains require that either the domain expert tailor such domain knowledge to the specific HTN planning algorithm, or an algorithm that can search efficiently using such domain knowledge. By leveraging a three-stage compiler design we can easily support more language descriptions and preprocessing optimizations that when chained can greatly improve runtime efficiency in such domains. In this paper we evaluate such optimizations with the HyperTensioN HTN planner, winner of the HTN IPC 2020 total-order track.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09705-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving execution concurrency in partial-order plans via block-substitution","authors":"Sabah Binte Noor, Fazlul Hasan Siddiqui","doi":"10.1007/s10458-025-09706-8","DOIUrl":"10.1007/s10458-025-09706-8","url":null,"abstract":"<div><p>Partial-order plans in AI planning facilitate execution flexibility and several other tasks, such as plan reuse, modification, and decomposition, due to their less constrained nature. A Partial-Order Plan (POP) specifies partial-order over actions, providing the flexibility of executing unordered actions in different sequences. This flexibility can be further extended by enabling parallel execution of actions in the POP to reduce its overall execution time. While extensive studies exist on improving the flexibility of a POP by optimizing its action orderings through plan deordering and reordering, there has been limited focus on the flexibility of executing actions concurrently in a plan. Flexibility of executing actions concurrently, referred to as concurrency, in a POP can be achieved by incorporating action non-concurrency constraints, specifying which actions can not be executed in parallel. This work establishes the necessary and sufficient conditions for non-concurrency constraints between two actions or two subplans with respect to a planning task. We also introduce an algorithm to improve a plan’s concurrency by optimizing resource utilization through substitutions of the plan’s subplans with respect to the corresponding planning task. Our algorithm employs block deordering that eliminates orderings in a POP by encapsulating coherent actions in blocks, and then exploits blocks as candidate subplans for substitutions. Experiments over the benchmark problems from International Planning Competitions (IPC) exhibit considerable improvement in plan concurrency.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143861407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptation Procedure in misinformation games","authors":"Konstantinos Varsos, Merkouris Papamichail, Giorgos Flouris, Marina Bitsaki","doi":"10.1007/s10458-025-09704-w","DOIUrl":"10.1007/s10458-025-09704-w","url":null,"abstract":"<div><p>We study interactions between agents in multi-agent systems, in which the agents are misinformed with regards to the game that they play, essentially having a subjective and incorrect understanding of the setting, without being aware of it. For that, we introduce a new game-theoretic concept, called misinformation games, that provides the necessary toolkit to study this situation. Subsequently, we enhance this framework by developing a time-discrete procedure (called the Adaptation Procedure) that captures iterative interactions in the above context. During the Adaptation Procedure, the agents update their information and reassess their behaviour in each step. We demonstrate our ideas through an implementation, which is used to study the efficiency and characteristics of the Adaptation Procedure.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09704-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143668114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On fair and efficient solutions for budget apportionment","authors":"Pierre Cardi, Laurent Gourvès, Julien Lesca","doi":"10.1007/s10458-025-09694-9","DOIUrl":"10.1007/s10458-025-09694-9","url":null,"abstract":"<div><p>This article deals with an apportionment problem involving <i>n</i> agents and a common budget <i>B</i>. Each agent submits some demands which are indivisible portions of the budget, and a central authority has to decide which demands to accept. The utility of an agent corresponds to the total amount of her accepted demands. In this context, it is desirable to be fair among the agents and efficient by not wasting the budget. An ideal solution would be to spend exactly <i>B</i>/<i>n</i> for every agent but this is rarely possible because of the indivisibility of the demands. Since combining fairness with efficiency is highly desirable but often impossible, we explore relaxed notions of fairness and efficiency, in order to determine if they go together. Our approach is also constructive because polynomial algorithms that build fair and efficient solutions are also given. The fairness criteria under consideration are the maximization of the minimum agent utility (max–min), proportionality, a customized notion of envy-freeness called jealousy-freeness, and the relaxations up to one or any demand of the previous two concepts. Efficiency in this work is either the maximization of the utilitarian social welfare or Pareto optimality. First we consider fairness and efficiency separately. The existence and computation of solutions that are either fair or efficient are studied. A complete picture of the relations that connect the fairness and efficiency concepts is provided. Second, we determine when fairness and efficiency can be combined for every possible instance. We prove that Pareto optimality is compatible with two notions of fairness, namely max–min and proportionality up to any demand. In contrast, none of the fairness concepts under consideration can be paired with the maximization of utilitarian social welfare. Therefore, we finally conduct a thorough analysis of the price of fairness which bounds the loss of efficiency caused by imposing fairness or one of its relaxations.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143655386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“Provably fair” algorithms may perpetuate racial and gender bias: a study of salary dispute resolution","authors":"James Hale, Peter H. Kim, Jonathan Gratch","doi":"10.1007/s10458-025-09703-x","DOIUrl":"10.1007/s10458-025-09703-x","url":null,"abstract":"<div><p>Prior work suggests automated dispute resolution tools using “provably fair” algorithms can address disparities between demographic groups. These methods use multi-criteria elicited preferences from all disputants and satisfy constraints to generate “fair” solutions. However, we analyze the potential for inequity to permeate proposals through the preference elicitation stage. This possibility arises if differences in dispositional attitudes differ between demographics, and those dispositions affect elicited preferences. Specifically, risk aversion plays a prominent role in predicting preferences. Risk aversion predicts a weaker relative preference for <i>salary</i> and a softer within-issue utility for each issue; this leads to worse compensation packages for risk-averse groups. These results raise important questions in AI-value alignment about whether an AI mediator should take explicit preferences at face value. \u0000\u0000</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09703-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the impact of direct punishment on the emergence of cooperation in multi-agent reinforcement learning systems","authors":"Nayana Dasgupta, Mirco Musolesi","doi":"10.1007/s10458-025-09698-5","DOIUrl":"10.1007/s10458-025-09698-5","url":null,"abstract":"<div><p>Solving the problem of cooperation is fundamentally important for the creation and maintenance of functional societies. Problems of cooperation are omnipresent within human society, with examples ranging from navigating busy road junctions to negotiating treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents capable of navigating these complex cooperative dilemmas is becoming increasingly evident. Direct punishment is a ubiquitous social mechanism that has been shown to foster the emergence of cooperation in both humans and non-humans. In the natural world, direct punishment is often strongly coupled with partner selection and reputation and used in conjunction with third-party punishment. The interactions between these mechanisms could potentially enhance the emergence of cooperation within populations. However, no previous work has evaluated the learning dynamics and outcomes emerging from multi-agent reinforcement learning populations that combine these mechanisms. This paper addresses this gap. It presents a comprehensive analysis and evaluation of the behaviors and learning dynamics associated with direct punishment, third-party punishment, partner selection, and reputation. Finally, we discuss the implications of using these mechanisms on the design of cooperative AI systems.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09698-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}