{"title":"Correction: A survey of multi-agent deep reinforcement learning with communication","authors":"Changxi Zhu, Mehdi Dastani, Shihan Wang","doi":"10.1007/s10458-024-09644-x","DOIUrl":"10.1007/s10458-024-09644-x","url":null,"abstract":"","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09644-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140148421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Onur Keskin, Furkan Cantürk, Cihan Eran, Reyhan Aydoğan
{"title":"Decentralized multi-agent path finding framework and strategies based on automated negotiation","authors":"M. Onur Keskin, Furkan Cantürk, Cihan Eran, Reyhan Aydoğan","doi":"10.1007/s10458-024-09639-8","DOIUrl":"10.1007/s10458-024-09639-8","url":null,"abstract":"<div><p>This paper introduces a negotiation framework to solve the Multi-Agent Path Finding (MAPF) Problem for self-interested agents in a decentralized fashion. The framework aims to achieve a good trade-off between the privacy of the agents and the effectiveness of solutions. Accordingly, a token-based bilateral negotiation protocol and two negotiation strategies are presented. The experimental results over four different settings of the MAPF problem show that the proposed approach could find conflict-free path solutions albeit suboptimally, especially when the search space is large and high-density. In contrast, Explicit Estimation Conflict-Based Search (EECBS) struggles to find optimal solutions. Besides, deploying a sophisticated negotiation strategy that utilizes information about local density for generating alternative paths can yield remarkably better solution performance in this negotiation framework.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09639-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140126005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Kreisel, Niclas Boehmer, Vincent Froese, Rolf Niedermeier
{"title":"Equilibria in schelling games: computational hardness and robustness","authors":"Luca Kreisel, Niclas Boehmer, Vincent Froese, Rolf Niedermeier","doi":"10.1007/s10458-023-09632-7","DOIUrl":"10.1007/s10458-023-09632-7","url":null,"abstract":"<div><p>In the simplest game-theoretic formulation of Schelling’s model of segregation on graphs, agents of two different types each select their own vertex in a given graph so as to maximize the fraction of agents of their type in their occupied neighborhood. Two ways of modeling agent movement here are either to allow two agents to swap their vertices or to allow an agent to jump to a free vertex. The contributions of this paper are twofold. First, we prove that deciding the existence of a swap-equilibrium and a jump-equilibrium in this simplest model of Schelling games is NP-hard, thereby answering questions left open by Agarwal et al. [AAAI ’20] and Elkind et al. [IJCAI ’19]. Second, we introduce two measures for the robustness of equilibria in Schelling games in terms of the minimum number of edges or the minimum number of vertices that need to be deleted to make an equilibrium unstable. We prove tight lower and upper bounds on the edge- and vertex-robustness of swap-equilibria in Schelling games on different graph classes.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09632-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140125673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sankarshan Damle, Aleksei Triastcyn, Boi Faltings, Sujit Gujar
{"title":"Differentially private multi-agent constraint optimization","authors":"Sankarshan Damle, Aleksei Triastcyn, Boi Faltings, Sujit Gujar","doi":"10.1007/s10458-024-09636-x","DOIUrl":"10.1007/s10458-024-09636-x","url":null,"abstract":"<div><p>Distributed constraint optimization (DCOP) is a framework in which multiple agents with private constraints (or preferences) cooperate to achieve a common goal optimally. DCOPs are applicable in several multi-agent coordination/allocation problems, such as vehicle routing, radio frequency assignments, and distributed scheduling of meetings. However, optimization scenarios may involve multiple agents wanting to protect their preferences’ privacy. Researchers propose privacy-preserving algorithms for DCOPs that provide improved privacy protection through cryptographic primitives such as partial homomorphic encryption, secret-sharing, and secure multiparty computation. These privacy benefits come at the expense of high computational complexity. Moreover, such an approach does not constitute a rigorous privacy guarantee for optimization outcomes, as the result of the computation may compromise agents’ preferences. In this work, we show how to achieve privacy, specifically Differential Privacy, by randomizing the solving process. In particular, we present P-Gibbs, which adapts the current state-of-the-art algorithm for DCOPs, namely SD-Gibbs, to obtain differential privacy guarantees with much higher computational efficiency. Experiments on benchmark problems such as Ising, graph-coloring, and meeting-scheduling show P-Gibbs’ privacy and performance trade-off for varying privacy budgets and the SD-Gibbs algorithm. More concretely, we empirically show that P-Gibbs provides fair solutions for competitive privacy budgets.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140020199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contest partitioning in binary contests","authors":"Priel Levy, Yonatan Aumann, David Sarne","doi":"10.1007/s10458-024-09637-w","DOIUrl":"10.1007/s10458-024-09637-w","url":null,"abstract":"<div><p>In this work we explore the opportunities presented by partitioning contestants in contest into disjoint groups, each competing in an independent contest, with its own prize. This, as opposed to most literature on contest design, which focuses on the setting of a single “grand” (possibly multi-stage) contest, wherein all potential contestants ultimately compete for the same prize(s), with few exceptions that do consider contest partitioning, yet with conflicting preference results concerning the optimal structure to be used. Focusing on binary contests, wherein the quality of contestants’ submissions are endogenously determined, we show that contest partitioning is indeed beneficial under some condition, e.g., whenever the number of contestants, or the prize amount, are “sufficiently large”, where the exact size requirements are a function of the partitioning cost. When partitioning does not entail any cost, we show that it is either a dominating or weakly dominating strategy, depending on the way the organizer’s expected benefit is determined. The analysis is further extended to consider partitioning where some of the sub-contests used contain a single contestant (a singleton). We conclude that contest partitioning is an avenue that contest designers can and should consider, when aiming to maximize their profit.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09637-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139988286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Landmark-based distributed topological mapping and navigation in GPS-denied urban environments using teams of low-cost robots","authors":"Mohammad Saleh Teymouri, Subhrajit Bhattacharya","doi":"10.1007/s10458-024-09635-y","DOIUrl":"10.1007/s10458-024-09635-y","url":null,"abstract":"<div><p>In this paper, we address the problem of autonomous multi-robot mapping, exploration and navigation in unknown, GPS-denied indoor or urban environments using a team of robots equipped with directional sensors with limited sensing capabilities and limited computational resources. The robots have no a priori knowledge of the environment and need to rapidly explore and construct a map in a distributed manner using existing landmarks, the presence of which can be detected using onboard senors, although little to no metric information (distance or bearing to the landmarks) is available. In order to correctly and effectively achieve this, the presence of a necessary density/distribution of landmarks is ensured by design of the urban/indoor environment. We thus address this problem in two phases: (1) During the design/construction of the urban/indoor environment we can ensure that sufficient landmarks are placed within the environment. To that end we develop a <i>filtration</i>-based approach for designing strategic placement of landmarks in an environment. (2) We develop a distributed algorithm which a team of robots, with no a priori knowledge of the environment, can use to explore such an environment, construct a topological map requiring no metric/distance information, and use that map to navigate within the environment. This is achieved using a topological representation of the environment (called a <i>Landmark Complex</i>), instead of constructing a complete metric/pixel map. The representation is built by the robot as well as used by them for navigation through a balanced strategy involving exploration and exploitation. We use tools from homology theory for identifying “<i>holes</i>” in the coverage/exploration of the unknown environment and hence guide the robots towards achieving a complete exploration and mapping of the environment. Our simulation results demonstrate the effectiveness of the proposed metric-free topological (simplicial complex) representation in achieving exploration, localization and navigation within the environment.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09635-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139910530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Berk Buzcu, Melissa Tessa, Igor Tchappi, Amro Najjar, Joris Hulstijn, Davide Calvaresi, Reyhan Aydoğan
{"title":"Towards interactive explanation-based nutrition virtual coaching systems","authors":"Berk Buzcu, Melissa Tessa, Igor Tchappi, Amro Najjar, Joris Hulstijn, Davide Calvaresi, Reyhan Aydoğan","doi":"10.1007/s10458-023-09634-5","DOIUrl":"10.1007/s10458-023-09634-5","url":null,"abstract":"<div><p>The awareness about healthy lifestyles is increasing, opening to personalized intelligent health coaching applications. A demand for more than mere suggestions and mechanistic interactions has driven attention to nutrition virtual coaching systems (NVC) as a bridge between human–machine interaction and recommender, informative, persuasive, and argumentation systems. NVC can rely on data-driven opaque mechanisms. Therefore, it is crucial to enable NVC to explain their doing (i.e., engaging the user in discussions (via arguments) about dietary solutions/alternatives). By doing so, transparency, user acceptance, and engagement are expected to be boosted. This study focuses on NVC agents generating personalized food recommendations based on user-specific factors such as allergies, eating habits, lifestyles, and ingredient preferences. In particular, we propose a user-agent negotiation process entailing run-time feedback mechanisms to react to both recommendations and related explanations. Lastly, the study presents the findings obtained by the experiments conducted with multi-background participants to evaluate the acceptability and effectiveness of the proposed system. The results indicate that most participants value the opportunity to provide feedback and receive explanations for recommendations. Additionally, the users are fond of receiving information tailored to their needs. Furthermore, our interactive recommendation system performed better than the corresponding traditional recommendation system in terms of effectiveness regarding the number of agreements and rounds.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09634-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A survey of multi-agent deep reinforcement learning with communication","authors":"Changxi Zhu, Mehdi Dastani, Shihan Wang","doi":"10.1007/s10458-023-09633-6","DOIUrl":"10.1007/s10458-023-09633-6","url":null,"abstract":"<div><p>Communication is an effective mechanism for coordinating the behaviors of multiple agents, broadening their views of the environment, and to support their collaborations. In the field of multi-agent deep reinforcement learning (MADRL), agents can improve the overall learning performance and achieve their objectives by communication. Agents can communicate various types of messages, either to all agents or to specific agent groups, or conditioned on specific constraints. With the growing body of research work in MADRL with communication (Comm-MADRL), there is a lack of a systematic and structural approach to distinguish and classify existing Comm-MADRL approaches. In this paper, we survey recent works in the Comm-MADRL field and consider various aspects of communication that can play a role in designing and developing multi-agent reinforcement learning systems. With these aspects in mind, we propose 9 dimensions along which Comm-MADRL approaches can be analyzed, developed, and compared. By projecting existing works into the multi-dimensional space, we discover interesting trends. We also propose some novel directions for designing future Comm-MADRL systems through exploring possible combinations of the dimensions.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09633-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139111866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IOB: integrating optimization transfer and behavior transfer for multi-policy reuse","authors":"Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang","doi":"10.1007/s10458-023-09630-9","DOIUrl":"10.1007/s10458-023-09630-9","url":null,"abstract":"<div><p>Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task. Transfer RL methods can reshape the policy optimization objective (optimization transfer) or influence the behavior policy (behavior transfer) using source policies. However, selecting the appropriate source policy with limited samples to guide target policy learning has been a challenge. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies’ value functions, which can lead to non-stationary policy optimization or heavy sampling costs, diminishing transfer effectiveness. To address this challenge, we propose a novel transfer RL method that selects the source policy without training extra components. Our method utilizes the Q function in the actor-critic framework to guide policy selection, choosing the source policy with the largest one-step improvement over the current target policy. We integrate optimization transfer and behavior transfer (IOB) by regularizing the learned policy to mimic the guidance policy and combining them as the behavior policy. This integration significantly enhances transfer effectiveness, surpasses state-of-the-art transfer RL baselines in benchmark tasks, and improves final performance and knowledge transferability in continual learning scenarios. Additionally, we show that our optimization transfer technique is guaranteed to improve target policy learning.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138558138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uniformly constrained reinforcement learning","authors":"Jaeyoung Lee, Sean Sedwards, Krzysztof Czarnecki","doi":"10.1007/s10458-023-09607-8","DOIUrl":"10.1007/s10458-023-09607-8","url":null,"abstract":"<div><p>We propose new multi-objective reinforcement learning algorithms that aim to find a globally Pareto-optimal deterministic policy that uniformly (in all states) maximizes a reward subject to a uniform probabilistic constraint over reaching forbidden states of a Markov decision process. Our requirements arise naturally in the context of safety-critical systems, but pose a significant unmet challenge. This class of learning problem is known to be hard and there are no off-the-shelf solutions that fully address the combined requirements of determinism and uniform optimality. Having formalized our requirements and highlighted the specific challenge of learning instability, using a simple counterexample, we define from first principles a stable Bellman operator that we prove partially respects our requirements. This operator is therefore a partial solution to our problem, but produces conservative polices in comparison to our previous approach, which was not designed to satisfy the same requirements. We thus propose a relaxation of the stable operator, using <i>adaptive hysteresis</i>, that forms the basis of a heuristic approach that is stable w.r.t. our counterexample and learns policies that are less conservative than those of the stable operator and our previous algorithm. In comparison to our previous approach, the policies of our adaptive hysteresis algorithm demonstrate improved monotonicity with increasing constraint probabilities, which is one of the characteristics we desire. We demonstrate that adaptive hysteresis works well with dynamic programming and reinforcement learning, and can be adapted to function approximation.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138491279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}