{"title":"Quantification of transfer in reinforcement learning via regret bounds for learning agents","authors":"Adrienne Tuynman, Ronald Ortner","doi":"10.1007/s10458-026-09739-7","DOIUrl":"10.1007/s10458-026-09739-7","url":null,"abstract":"<div>\u0000 \u0000 <p>We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of <span>(varvec{aleph })</span> agents operating in the same Markov decision process, however possibly with different reward functions, we consider the regret each agent suffers with respect to an optimal policy maximizing its average reward. We show that when the agents share their observations the mutual regret of all agents is smaller by a factor of <span>(varvec{sqrt{aleph }})</span> compared to the case when each agent has to rely on the information collected by itself. This result demonstrates how considering the regret in multi-agent settings can provide theoretical bounds on the benefit of sharing observations in transfer learning.</p>\u0000 </div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09739-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147441169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A task delegation model: An approach based on trustworthiness in sub-delegations and delegation chain formation","authors":"Jeferson José Baqueta, Cesar Augusto Tacla","doi":"10.1007/s10458-026-09741-z","DOIUrl":"10.1007/s10458-026-09741-z","url":null,"abstract":"<div>\u0000 \u0000 <p>Task delegation is a fundamental mechanism adopted by agents to solve problems that involve teamwork. A critical issue in this context is trust establishment, in which agents must estimate the trustworthiness of potential partners based on their social behavior and environmental conditions. In the literature, most computational trust models address task delegation from a mono-episodic perspective, ignoring the possibility of sub-delegations and the resulting formation of delegation chains. Delegation chains enable the representation of complex social structures that capture agents’ dependency relationships. This work presents a task delegation model that explicitly supports sub-delegations through task decomposition and recursive delegation, while accounting for delegation chains. In the proposed model, agents select partners based on historical information about their performance, combined with social evaluations such as social image, reputation, and references. Experimental results show that, when compared to mono-episodic delegation approaches, the proposed model is particularly effective in dynamic environments where agents’ behavior may change over time.</p>\u0000 </div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09741-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Maass Martínez, Vincent Mousseau, Anaëlle Wilczynski
{"title":"A Hotelling-Downs game for strategic candidacy with binary issues","authors":"Javier Maass Martínez, Vincent Mousseau, Anaëlle Wilczynski","doi":"10.1007/s10458-026-09737-9","DOIUrl":"10.1007/s10458-026-09737-9","url":null,"abstract":"<div>\u0000 \u0000 <p>In a pre-election period, candidates may, in the course of the public political campaign, adopt a strategic behavior by modifying their advertised political views, to obtain a better outcome in the election. This situation can be modeled by a type of strategic candidacy game, close to the Hotelling-Downs framework, which has been investigated in previous works via political views that are positions in a common one-dimensional axis. However, the left-right axis cannot always capture the actual political stances of candidates. Therefore, we propose to model the political views of candidates as opinions over binary issues (e.g., for or against higher taxes, abortion, etc.), implying that the space of possible political views can be represented by a hypercube whose dimension is the number of issues. In this <i>binary strategic candidacy</i> game, we introduce the notion of local equilibrium, broader than the Nash equilibrium, which is a stable state with respect to candidates that can change their view on at most a given number of issues. We study the existence of local equilibria in our game and identify, in the case of two candidates, natural conditions under which the existence of an equilibrium is guaranteed. To complement our theoretical results, we provide experiments to empirically evaluate the existence of local equilibria and their quality.</p>\u0000 </div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147363124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Amanatidis, Elliot Anshelevich, Christopher Jerrett, Alexandros A. Voudouris
{"title":"Metric distortion under group-fair objectives","authors":"Georgios Amanatidis, Elliot Anshelevich, Christopher Jerrett, Alexandros A. Voudouris","doi":"10.1007/s10458-026-09742-y","DOIUrl":"10.1007/s10458-026-09742-y","url":null,"abstract":"<div>\u0000 \u0000 <p>We consider a voting problem in which a set of agents have metric preferences over a set of alternatives, and are also partitioned into disjoint groups. Given information about the preferences of the agents and their groups, our goal is to decide an alternative to approximately minimize an objective function that takes the groups of agents into account. We consider two natural group-fair objectives known as <i>Max-of-Avg</i> and <i>Avg-of-Max</i> which are different combinations of the max and the average cost in and out of the groups. We show tight bounds on the best possible <i>distortion</i> that can be achieved by various classes of mechanisms depending on the amount of information they have access to. In particular, we consider <i>full-information group-oblivious</i> mechanisms that do not know the groups but have access to the exact distances between agents and alternatives in the metric space, <i>ordinal-information group-oblivious</i> mechanisms that again do not know the groups but are given the ordinal preferences of the agents, and <i>group-aware</i> mechanisms that have full knowledge of the structure of the agent groups and also ordinal information about the metric space.</p>\u0000 </div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09742-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147363123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva
{"title":"Generalized policy improvement for efficient and robust multi-objective reinforcement learning","authors":"Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva","doi":"10.1007/s10458-026-09736-w","DOIUrl":"10.1007/s10458-026-09736-w","url":null,"abstract":"<div>\u0000 \u0000 <p>Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different <i>preferences</i> over (possibly conflicting) reward functions. These algorithms often learn a set of policies, each optimized for a particular agent preference, that are later reused when optimizing policies for different preferences. We introduce a novel algorithm that builds upon Generalized Policy Improvement (GPI) to construct principled, formally-derived prioritization schemes that improve sample efficiency. These correspond to active-learning strategies by which the agent can identify <i>(i)</i> the most promising preferences/objectives to train on at each moment; and <i>(ii)</i> the most relevant previous experiences to learn policies for new agent preferences through a novel Dyna-style MORL method. We prove our algorithm is guaranteed to always converge to an optimal solution in a finite number of steps, or an <span>(epsilon )</span>-optimal solution (for a bounded <span>(epsilon )</span>) if the agent can only identify sub-optimal policies. Our method monotonically improves the quality of its partial solutions while learning. We also introduce a bound that characterizes the maximum utility loss (with respect to the optimal solution) incurred by intermediate policies identified by our method during learning. Finally, we propose a novel epistemic uncertainty-aware extension of GPI that exploits high-confidence lower bounds to mitigate the impact of unreliable action-value estimates in GPI policies, and prove that it provides tighter performance bounds than the current state of the art. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.</p>\u0000 </div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09736-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147363122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geometric freeze-tag problem","authors":"Sharareh Alipour, Arash Ahadi, Kajal Baghestani, Soroush Sahraei, Mahdis Mirzaei","doi":"10.1007/s10458-026-09738-8","DOIUrl":"10.1007/s10458-026-09738-8","url":null,"abstract":"<div><p>The Freeze-Tag Problem (FTP) involves activating a set of initially inactive robots as quickly as possible, starting from a single active robot. Once activated, a robot can assist in activating other robots. Each active robot moves at unit speed. The objective is to minimize the makespan, i.e., the time required to activate the last robot. A key performance measure is the wake-up ratio, defined as the maximum time needed to activate all of the robots in any initial configuration. This work focuses on the geometric (Euclidean) version of FTP in <span>(varvec{mathbb {R}}^{varvec{d}})</span> under the <span>(varvec{ell }_{varvec{p}})</span> norm, where the initial distance between each inactive robot and the single active robot is at most <span>(varvec{1})</span>. For <span>(varvec{(mathbb {R}}^{varvec{2}}varvec{, ell }_{varvec{2}}varvec{)})</span>, we improve the previous upper bound of <span>(varvec{4.62})</span> (Bonichon et al. [1], CCCG 2024) to <span>(varvec{4.31})</span>. The known lower bound for the wake-up ratio is <span>(varvec{3.82})</span>. In <span>(varvec{mathbb {R}}^{varvec{3}})</span>, we propose a new strategy that achieves a wake-up ratio of <span>(varvec{12})</span> for <span>(varvec{(mathbb {R}}^{varvec{3}}varvec{, ell }_{varvec{1}}varvec{)})</span> and <span>(varvec{12.76})</span> for <span>(varvec{(mathbb {R}}^{varvec{3}}varvec{, ell }_{varvec{2}}varvec{)})</span>. We also explore the FTP in <span>(varvec{(mathbb {R}}^{varvec{3}}varvec{, ell }_{varvec{2}}varvec{)})</span> for specific instances where robots are positioned on the boundary of a sphere, providing further insights into practical scenarios. Finally, we demonstrate the practical efficiency of our <span>(varvec{(mathbb {R}}^{varvec{2}}varvec{, ell }_{varvec{2}}varvec{)})</span> algorithm through simulations on real-world spatial data.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147342561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Davide Dell’Anna, Pradeep K. Murukannaiah, Mireia Yurrita, Bernd Dudzik, Davide Grossi, Catholijn M. Jonker, Catharine Oertel, Pınar Yolum
{"title":"From human teams to hybrid intelligence teams: identifying, characterizing, and evaluating foundational quality attributes","authors":"Davide Dell’Anna, Pradeep K. Murukannaiah, Mireia Yurrita, Bernd Dudzik, Davide Grossi, Catholijn M. Jonker, Catharine Oertel, Pınar Yolum","doi":"10.1007/s10458-025-09730-8","DOIUrl":"10.1007/s10458-025-09730-8","url":null,"abstract":"<div><p>Hybrid Intelligence (HI) is an emerging paradigm in which artificial intelligence (AI) augments human intelligence. The current literature lacks systematic models that guide the design and evaluation of HI systems. Further, discussions around HI primarily focus on technology, neglecting the holistic human-AI ensemble. In this paper, we take the initial steps toward the development of a quality model for characterizing and evaluating HI systems from a human-AI teams perspective. We first conducted a study investigating the adequacy of properties commonly associated with effective human teams to describe HI. The study features the insights of 50 HI researchers, and shows that various human team properties, including boundedness, interdependence, competency, purposefulness, initiative, normativity, and effectiveness, are important for HI systems. Based on these results, we developed a quality model for HI teams composed of seven high-level quality attributes, further refined into 16 specific ones. To evaluate the relevance and understanding of the proposed attributes, we conducted a second empirical investigation by staging competitions in which participants used the quality model to develop and analyze HI usage scenarios. Our analysis of 48 collected scenarios, which we openly release, confirms the proposed attributes’ relevance and highlights insights that emerge when designers consider the quality model in HI system design.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12916932/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximin shares under cardinality constraints","authors":"Halvard Hummel, Magnus Lie Hetland","doi":"10.1007/s10458-026-09731-1","DOIUrl":"10.1007/s10458-026-09731-1","url":null,"abstract":"<div><p>We study the problem of fair allocation of a set of indivisible items among agents with additive valuations, under cardinality constraints. In this setting, the items are partitioned into categories, each with its own limit on the number of items it may contribute to any bundle. We consider the fairness criterion known as the <i>maximin share</i> (MMS) <i>guarantee</i>, and propose a novel polynomial-time algorithm for finding 1/2-approximate MMS allocations for goods—an improvement from the previously best available guarantee of 11/30. For single-category instances, we show that a modified variant of our algorithm is guaranteed to produce 2/3-approximate MMS allocations. Among various other existence and non-existence results, we show that a <span>((sqrt{n}/(2sqrt{n} - 1)))</span>-approximate MMS allocation always exists for goods. For chores, we show similar results as for goods, with a 2-approximate algorithm in the general case and a 3/2-approximate algorithm for single-category instances. We extend the notions and algorithms related to <i>ordered</i> and <i>reduced instances</i> to work with cardinality constraints, and combine these with <i>bag filling</i> style procedures to construct our algorithms.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09731-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147339589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient Bayesian learning-based opponent model considering parametric interrelation in automated bilateral multi-issue negotiation","authors":"Shengbo Chang, Katsuhide Fujita","doi":"10.1007/s10458-026-09733-z","DOIUrl":"10.1007/s10458-026-09733-z","url":null,"abstract":"<div><p>Opponent models that predict opponents’ utility functions can help achieve favorable outcomes in automated bilateral multi-issue negotiations. Bayesian learning-based opponent models are flexible and adaptable to various negotiation contexts. However, existing Bayesian learning-based opponent models compromise prediction accuracy for computational efficiency by assuming independent issues and specific utility function shapes. We propose a novel Bayesian learning-based opponent model that improves prediction accuracy while maintaining computational efficiency by relaxing the shape assumption and separately learning each parameter of the utility function. Although parameters are estimated independently, this removes structural constraints and increases mutual dependence between parameters during inference. Each parameter is estimated with its conditional expectation, conditioned on the estimates of the other parameters, and computed efficiently through an iterative learning algorithm. We further introduce a resampling method to mitigate degeneracy in the hypothesis space and maintain diversity. Experiments across 45 negotiation domains against seven temporal and 10 Automated Negotiating Agents Competition (ANAC) final-list agents show that the proposed model outperforms existing Bayesian learning-, frequency-, and value-based opponent models. Ablation results validate the effectiveness and synergy of the parametric interrelation consideration and the resampling method.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147339944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaime Arias, Wojciech Jamroga, Wojciech Penczek, Laure Petrucci, Teofil Sidoruk
{"title":"Strategic (timed) computation tree logic","authors":"Jaime Arias, Wojciech Jamroga, Wojciech Penczek, Laure Petrucci, Teofil Sidoruk","doi":"10.1007/s10458-025-09726-4","DOIUrl":"10.1007/s10458-025-09726-4","url":null,"abstract":"<div><p>We define extensions of <span>(textbf{CTL})</span> and <b>TCTL</b> with strategic operators, called Strategic <span>(textbf{CTL})</span> (<b>SCTL</b>) and Strategic <b>TCTL</b> (<b>STCTL</b>), respectively. For each of the above logics we give a synchronous and asynchronous semantics, i.e. <b>STCTL</b> is interpreted over networks of extended Timed Automata (TA) that either make synchronous moves or synchronise via joint actions. We consider several semantics regarding information: imperfect (i) and perfect (I), and recall: imperfect (r) and perfect (R). We prove that <b>SCTL</b> is more expressive than <b>ATL</b> for all semantics,and this holds for the timed versions as well. Moreover, the model checking problem for <b>SCTL</b><span>(_{{textbf {ir}}})</span> is of the same complexity as for <b>ATL</b><span>(_{{textbf {ir}}})</span>, the model checking problem for <b>STCTL</b><span>(_{{textbf {iR}}})</span> is of the same complexity as for <b>TCTL</b>, while for <b>STCTL</b><span>(_{{textbf {iR}}})</span> it is undecidable as for <b>ATL</b><span>(_{{textbf {iR}}})</span>. The above results suggest to use <b>SCTL</b><span>(_{{textbf {ir}}})</span> and <b>STCTL</b><span>(_{{textbf {ir}}})</span> in practical applications. Therefore, we use the tool IMITATOR to support model checking of <b>STCTL</b><span>(_{{textbf {ir}}})</span>.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09726-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}