{"title":"A multi-objective goal-oriented reinforcement learning algorithm for dynamic multi-objective sequential decision making","authors":"Haofang Yu, Hong-chuan Yang, Yanyan Huang","doi":"10.1007/s10458-026-09735-x","DOIUrl":"10.1007/s10458-026-09735-x","url":null,"abstract":"<div><p>Multi-objective reinforcement learning (MORL) algorithms predominantly rely on scalarization functions parameterized with the preferences of the decision maker to derive trade-off solutions. However, this is not always feasible or desirable in the deterministic settings where scalarization functions are hard to specify, or where Pareto optimal solutions vary solely due to changes in the multi-objective reward function. Therefore, we consider a goal-augmented dynamic multi-objective Markov decision process (GA-DMOMDP), which enables the learning of Pareto optimal solutions through specifying and pursuing appropriate goals rather than relying on explicit scalarization functions. Restricted to the above GA-DMOMDPs, a multi-objective goal-oriented reinforcement learning (MOGORL) algorithm is further proposed so that the possibly changing Pareto optimal solutions can be tracked. In our algorithm, an on-line learning mode is proposed to continuously detect new goals, and to simultaneously pursue different goals by a hindsight relabeling strategy. Experimental results show that our algorithm can learn the Pareto optimal solutions in the deterministic environments with either static or dynamically changing rewards, regardless of the shape of Pareto optimal fronts, which outperforms generalized MORL algorithms with linear and Chebyshev scalarization functions.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147337940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Green disclosure policies and market dynamics: evidence from agent-based ESG models","authors":"Lingxiao Zhao, Maria Polukarov, Carmine Ventre","doi":"10.1007/s10458-026-09734-y","DOIUrl":"10.1007/s10458-026-09734-y","url":null,"abstract":"<div><p>Green disclosure policies aim to improve the transparency of corporate environmental practices and guide investors’ capital allocation. While existing studies mostly examine firm-level effects, their market-level implications in multi-agent systems remain insufficiently explored. This paper develops a dual-market dynamic ESG fund model, integrating agent-based simulation with empirical game-theoretic analysis, to study how upgrade costs, investor valuation preferences, and disclosure regimes jointly shape firms’ green transition incentives in the EU and China. The results show that both transition costs and valuation gaps strongly influence strategic upgrading behaviour and equilibrium outcomes: Strict disclosure sharpens differentiation but may suppress upgrading due to high costs; lax disclosure facilitates initial transitions by polluting firms; and hybrid disclosure, combining lax and strict phases, generates stronger incentives across different firm types. Cross-market comparison further indicates that the EU’s mature regulatory environment is better suited to strict disclosure, whereas China’s emerging market benefits more from a lax form to accelerate early-stage transitions. This study provides a reference for regulators in selecting appropriate disclosure forms at different levels of market maturity and offers methodological support for the sustainable development of green finance markets.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09734-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147337254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrés Holgado-Sánchez, Holger Billhardt, Alberto Fernández, Sascha Ossowski
{"title":"Learning the value systems of agents with preference-based and inverse reinforcement learning","authors":"Andrés Holgado-Sánchez, Holger Billhardt, Alberto Fernández, Sascha Ossowski","doi":"10.1007/s10458-026-09732-0","DOIUrl":"10.1007/s10458-026-09732-0","url":null,"abstract":"<div>\u0000 \u0000 <p>Agreement Technologies refer to open computer systems in which autonomous software agents interact with one another, typically on behalf of humans, in order to come to mutually acceptable agreements. With the advance of AI systems in recent years, it has become apparent that such agreements, in order to be acceptable to the involved parties, must remain aligned with ethical principles and moral values. However, this is notoriously difficult to ensure, especially as different human users (and their software agents) may hold different value systems, i.e. they may differently weigh the importance of individual moral values. Furthermore, it is often hard to specify the precise meaning of a value in a particular context in a computational manner. Methods to estimate value systems based on human-engineered specifications, e.g. based on value surveys, are limited in scale due to the need for intense human moderation. In this article, we propose a novel method to automatically <i>learn</i> value systems from observations and human demonstrations. In particular, we propose a formal model of the <i>value system learning</i> problem, its instantiation to sequential decision-making domains based on multi-objective Markov decision processes, as well as tailored preference-based and inverse reinforcement learning algorithms to infer value grounding functions and value systems. The approach is illustrated and evaluated by two simulated use cases.</p>\u0000 </div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-026-09732-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147336362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed concurrent triangulation for a group of low-cost robots with bearing-only sensing","authors":"Seoung Kyou Lee","doi":"10.1007/s10458-025-09728-2","DOIUrl":"10.1007/s10458-025-09728-2","url":null,"abstract":"<div><p>This paper presents distributed message-based triangulation that concurrently extracts a triangle lattice from an underlying robot network. The lattice is aimed at being planar. Hence, a robot inside a specific triangle can localize using only the bearing information of its local neighbors. For the planarity of the lattice, we introduce two different versions of triangulation. 1) The <i>default</i> version compares the unique ID of two or more geometrically overlapped triangles, and only the triangle with the lowest ID survives. 2) The <i>edge-length minimization</i> version estimates the longest edge lengths of all overlapped triangles and elects only the triangle having the locally shortest one. For each version, we provide implementation details on a group of practical low-cost robots, r-one, with extensive simulation results that validate its feasibility to the platform. We also address theoretical analyses showing that our methods are scalable for any population of robots.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dingbang Liu, Fenghui Ren, Jun Yan, Guoxin Su, Wen Gu, Shohei Kato
{"title":"Improving scalability of multi-agent deep reinforcement learning with suboptimal human knowledge","authors":"Dingbang Liu, Fenghui Ren, Jun Yan, Guoxin Su, Wen Gu, Shohei Kato","doi":"10.1007/s10458-025-09729-1","DOIUrl":"10.1007/s10458-025-09729-1","url":null,"abstract":"<div><p>Due to its exceptional learning ability, multi-agent deep reinforcement learning (MADRL) has garnered widespread research interest. However, since the learning is data-driven and involves sampling from millions of steps, training a large number of agents is inherently challenging and inefficient. Inspired by the human learning process, we aim to transfer knowledge from humans to avoid starting from scratch. Given the growing emphasis on the Human-on-the-Loop concept, this study focuses on addressing the challenges of large-population learning by incorporating suboptimal human knowledge into the cooperative multi-agent environment. To leverage human experience, we integrate human knowledge into the training process of MADRL, representing it in natural language rather than specific action-state pairs. Compared to previous works, we further consider the attributes of transferred knowledge to assess its impact on algorithm scalability. Additionally, we examine several features of knowledge mapping to effectively convert human knowledge to the action space where agent learning occurs. In reaction to the disparity in knowledge construction between humans and agents, our approach allows agents to decide freely which portions of the state space to leverage human knowledge. From the challenging domains of the StarCraft Multi-agent Challenge, our method successfully alleviates the scalability issue in MADRL. Furthermore, we find that, despite individual-type knowledge significantly accelerating the training process, cooperative-type knowledge is more desirable for addressing a large agent population. We hope this study provides valuable insights into applying and mapping human knowledge, ultimately enhancing the interpretability of agent behavior.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09729-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Akwasi Frimpong, Mu Han, Wenyi Zheng, Andrew Quansah
{"title":"GR-MADRL: a multi-agent deep reinforcement learning framework with hamiltonian optimization for task offloading in vehicular fog computing","authors":"Samuel Akwasi Frimpong, Mu Han, Wenyi Zheng, Andrew Quansah","doi":"10.1007/s10458-025-09727-3","DOIUrl":"10.1007/s10458-025-09727-3","url":null,"abstract":"<div><p>Efficient task offloading strategies face considerable implementation barriers in vehicular fog computing (VFC) network contexts, due to dynamic vehicular mobility, fluctuating network conditions, and varying fog resource distribution. These complexities hinder efficient task offloading and resource utilization, leading to suboptimal quality of service (QoS) in terms of latency and energy efficiency. Existing deep reinforcement learning and optimization techniques struggle to adapt to these dynamic conditions, necessitating a more robust approach. This paper proposes an advanced task offloading framework that integrates gated recurrent unit-based multi-agent deep reinforcement learning (GR-MADRL) and dynamic Hamiltonian optimization (DHO). Our framework employs an attention-enhanced GRU network to process complex temporal network states, enabling effective feature prioritization and state prediction. Vehicles and fog nodes collaborate as autonomous agents through a multi-agent reinforcement learning system, supported by a central coordinator that constructs a global network view and computes optimal policies for local implementation. Additionally, we formulate the task offloading problem as a dynamic Hamiltonian optimization to maximize long-term rewards and ensure system stability. Extensive simulations demonstrate that GR-MADRL integrated with DHO significantly reduces task latency by 28.3%, lowers energy consumption by 35.1%, and improves task offloading success rates to 94.2%, outperforming baseline methods. These results highlight the potential of this approach to improve scalability, efficiency, and real-time decision making in VFC networks.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"40 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145779293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for building explainable collaborative multimodal dialogue systems using a theory of mind","authors":"Philip R. Cohen, Lucian Galescu, Maayan Shvo","doi":"10.1007/s10458-025-09717-5","DOIUrl":"10.1007/s10458-025-09717-5","url":null,"abstract":"<div><p>Eva is a multimodal conversational framework for building planning-based systems that help users accomplish their domain goals through collaborative dialogue. We argue that planning-based systems can do this by inferring users’ intentions, adopting goals to achieve them, developing plans to achieve those goals, detecting whether obstacles are present, finding plans to overcome them, and planning their actions, including speech acts. In the Eva framework, conversational agents can maintain and reason with their own beliefs, goals and intentions, and explicitly reason about those of their users. Belief reasoning is accomplished with a modal Horn-clause meta-interpreter. The planning and reasoning subsystems obey the principles of persistent goals and intentions, including the formation and decomposition of intentions to perform complex actions, as well as the conditions under which they can be given up. In virtue of its planning process, Eva treats its speech acts just like its other actions – physical acts affect physical states, digital acts affect digital states, and speech acts affect mental and social states. This general framework enables systems to plan a variety of speech acts including requests, informs, questions, confirmations, recommendations, offers, acceptances, greetings, and emotive expressions. Each of these has a formally specified semantics which is used during the planning and reasoning processes. Because Eva-based agents can keep track of different users’ mental states, they can engage in multi-party dialogues. Importantly, the framework supports systems’ explanations of their actions because they have created plans standing behind each of them. As a reaction to the near-universal focus on using large language models for every application, a trend has emerged recently towards integrated neuro-symbolic architectures. The Eva framework is an example of such an architecture in the area of collaborative dialogue systems.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09717-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145561724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strategic classification for non-uniform preferences using penalty labels and randomisation","authors":"Manish Kumar Singh, Ankur A. Kulkarni","doi":"10.1007/s10458-025-09725-5","DOIUrl":"10.1007/s10458-025-09725-5","url":null,"abstract":"<div><p>Strategic classification is the problem of classifying feature vectors that can be manipulated at the testing phase of the classifier. The classifier aims for its decision rule to be robust against strategic manipulation and also be efficiently learnable. In this paper, we present two main ideas for the classifier to achieve this goal. The first idea is to enrich the classifier’s decision-making capabilities in two ways: ignoring feature vectors and making randomized decisions. This approach helps the classifier classify more feature vectors correctly by limiting the sender’s options for manipulation. The second idea is to use the strategic structure of the problem during learning. This approach, in certain contexts, can simplify the learning process. Combining these ideas, we propose and compare two learning algorithms, Vanilla ERM and Strategy-aware ERM, and provide a heuristic for selecting the one that yields a classifier both robust to strategic manipulation and efficiently learnable.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The distortion of threshold approval matching","authors":"Mohamad Latifian, Alexandros A. Voudouris","doi":"10.1007/s10458-025-09724-6","DOIUrl":"10.1007/s10458-025-09724-6","url":null,"abstract":"<div><p>We study matching settings in which a set of agents have private utilities over a set of items. Each agent reports a partition of the items into approval sets of different threshold utility levels. Given this limited information on input, the goal is to compute an assignment of the items to the agents (subject to cardinality constraints depending on the application) that (approximately) maximizes the social welfare (the total utility of the agents for their assigned items). We first consider the well-known, simple one-sided matching problem in which each of <i>n</i> agents is to be assigned exactly one of <i>n</i> items. We show that with <i>t</i> threshold utility levels, the distortion of deterministic matching algorithms is <span>(Theta (root t of {n}))</span> while that of randomized algorithms is <span>(Theta (root t+1 of {n}))</span>. We then show that our distortion bounds extend to a more general setting in which there are multiple copies of the items, each agent can be assigned a number of items (even copies of the same one) up to a capacity, and the utility of an agent for an item depends on the number of its copies that the agent is given.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09724-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous facility location games with fractional preferences and limited resources","authors":"Jiazhu Fang, Qizhi Fang, Wenjing Liu, Minming Li","doi":"10.1007/s10458-025-09722-8","DOIUrl":"10.1007/s10458-025-09722-8","url":null,"abstract":"<div><p>In this paper, we study the heterogeneous facility location game with fractional preferences under resource constraints. In this model, a group of agents are positioned along the interval [0, 1], where each agent has position information and fractional preferences indicated as support weights for facilities. Our main focus is to design mechanisms that choose and locate one facility out of two facilities while motivating agents to truthfully report their information, aiming to approximately maximize the social utility, defined as the sum of utilities of all agents. Based on the types of private information held by agents, we consider three different settings. For the known-preferences setting, we provide a deterministic group strategy-proof mechanism with 2-approximation and a randomized group strategy-proof mechanism with <span>(frac{4}{3})</span>-approximation. We also provide lower bounds of 2 on the approximation ratio for any deterministic strategy-proof mechanism and 1.043 for any randomized strategy-proof mechanism. For the known-positions setting and the general setting, we present a deterministic group strategy-proof mechanism with 6-approximation and a randomized strategy-proof mechanism with 4-approximation, respectively. Furthermore, we give lower bounds of 1.554 for any deterministic strategy-proof mechanism and 1.2 for any randomized strategy-proof mechanism in the known-positions setting. Finally, we extend the model to the scenario of choosing <i>k</i> facilities out of <i>m</i> facilities. For the known-preferences setting, we provide a 2-approximate deterministic group strategy-proof mechanism, which is also the best deterministic strategy-proof mechanism. For the known-positions setting, when <span>(k ge 2)</span>, we give a lower bound of <span>(2-frac{1}{k})</span> for any deterministic strategy-proof mechanism.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 2","pages":""},"PeriodicalIF":2.6,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}