Autonomous Agents and Multi-Agent Systems最新文献

筛选
英文 中文
Logic-based cognitive planning for conversational agents 对话式代理基于逻辑的认知规划
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-05-20 DOI: 10.1007/s10458-024-09646-9
Jorge Luis Fernandez Davila, Dominique Longin, Emiliano Lorini, Frédéric Maris
{"title":"Logic-based cognitive planning for conversational agents","authors":"Jorge Luis Fernandez Davila,&nbsp;Dominique Longin,&nbsp;Emiliano Lorini,&nbsp;Frédéric Maris","doi":"10.1007/s10458-024-09646-9","DOIUrl":"10.1007/s10458-024-09646-9","url":null,"abstract":"<div><p>This paper presents a novel approach to cognitive planning based on an NP-complete logic of explicit and implicit belief whose satisfiability checking problem is reduced to SAT. We illustrate the potential for application of our model by formalizing and then implementing a human–machine interaction scenario in which an artificial agent interacts with a human agent through dialogue and tries to motivate her to practice a sport. To make persuasion effective, the artificial agent needs a model of the human’s beliefs and desires which is built during interaction through a sequence of belief revision operations. We consider two cognitive planning algorithms and compare their performances, a brute force algorithm based on SAT and a QBF-based algorithm.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141121005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tackling school segregation with transportation network interventions: an agent-based modelling approach 利用交通网络干预措施解决学校隔离问题:基于代理的建模方法
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-05-20 DOI: 10.1007/s10458-024-09652-x
Dimitris Michailidis, Mayesha Tasnim, Sennay Ghebreab, Fernando P. Santos
{"title":"Tackling school segregation with transportation network interventions: an agent-based modelling approach","authors":"Dimitris Michailidis,&nbsp;Mayesha Tasnim,&nbsp;Sennay Ghebreab,&nbsp;Fernando P. Santos","doi":"10.1007/s10458-024-09652-x","DOIUrl":"10.1007/s10458-024-09652-x","url":null,"abstract":"<div><p>We address the emerging challenge of school segregation within the context of free school choice systems. Households take into account both proximity and demographic composition when deciding on which schools to send their children to, potentially exacerbating residential segregation. This raises an important question: can we strategically intervene in transportation networks to enhance school access and mitigate segregation? In this paper, we propose a novel, network agent-based model to explore this question. Through simulations in both synthetic and real-world networks, we demonstrate that enhancing school accessibility via transportation network interventions can lead to a reduction in school segregation, under specific conditions. We introduce group-based network centrality measures and show that increasing the centrality of certain neighborhood nodes with respect to a transportation network can be an effective strategy for strategic interventions. We conduct experiments in two synthetic network environments, as well as in an environment based on real-world data from Amsterdam, the Netherlands. In both cases, we simulate a population of representative agents emulating real citizens’ schooling preferences, and we assume that agents belong to two different groups (e.g., based on migration background). We show that, under specific homophily regimes in the population, school segregation can be reduced by up to 35%. Our proposed framework provides the foundation to explore how citizens’ preferences, school capacity, and public transportation can shape patterns of urban segregation.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09652-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141120455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing balanced solutions for large international kidney exchange schemes 计算大型国际换肾计划的平衡解决方案
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-05-16 DOI: 10.1007/s10458-024-09645-w
Márton Benedek, Péter Biró, Daniel Paulusma, Xin Ye
{"title":"Computing balanced solutions for large international kidney exchange schemes","authors":"Márton Benedek,&nbsp;Péter Biró,&nbsp;Daniel Paulusma,&nbsp;Xin Ye","doi":"10.1007/s10458-024-09645-w","DOIUrl":"10.1007/s10458-024-09645-w","url":null,"abstract":"<div><p>To overcome incompatibility issues, kidney patients may swap their donors. In international kidney exchange programmes (IKEPs), countries merge their national patient–donor pools. We consider a recently introduced credit system. In each round, countries are given an initial “fair” allocation of the total number of kidney transplants. This allocation is adjusted by a credit function yielding a target allocation. The goal is to find a solution that approaches the target allocation as closely as possible, to ensure long-term stability of the international pool. As solutions, we use maximum matchings that lexicographically minimize the country deviations from the target allocation. We perform, for the first time, a computational study for a <i>large</i> number of countries. For the initial allocations we use two easy-to-compute solution concepts, the benefit value and the contribution value, and four classical but hard-to-compute concepts, the Shapley value, nucleolus, Banzhaf value and tau value. By using state-of-the-art software we show that the latter four concepts are now within reach for IKEPs of up to fifteen countries. Our experiments show that using lexicographically minimal maximum matchings instead of ones that only minimize the largest deviation from the target allocation (as previously done) may make an IKEP up to 54% more balanced.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09645-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems 离线策略重用指导下的随时在线多代理集体规划及其在按需移动系统中的应用
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-05-16 DOI: 10.1007/s10458-024-09650-z
Wanyuan Wang, Qian Che, Yifeng Zhou, Weiwei Wu, Bo An, Yichuan Jiang
{"title":"Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems","authors":"Wanyuan Wang,&nbsp;Qian Che,&nbsp;Yifeng Zhou,&nbsp;Weiwei Wu,&nbsp;Bo An,&nbsp;Yichuan Jiang","doi":"10.1007/s10458-024-09650-z","DOIUrl":"10.1007/s10458-024-09650-z","url":null,"abstract":"<div><p>The popularity of mobility-on-demand (MoD) systems boosts online collective multiagent planning (Online_CMP), where spatially distributed servicing agents are planned to meet dynamically arriving demands. For city-scale MoDs with a fleet of agents, Online_CMP methods must make a tradeoff between computation time (i.e., real-time) and solution quality (i.e., the number of demands served). Directly using an offline policy can guarantee real-time, but cannot be dynamically adjusted to real agent and demand distributions. Search-based online planning methods are adaptive, but are computationally expensive and cannot scale up. In this paper, we propose a principled Online_CMP method, which reuses and improves the offline policy in an anytime manner. We first model MoDs as a collective Markov Decision Process (<span>({mathbb {C}})</span>-MDP) where the collective behavior of agents affects the joint reward. Given the <span>({mathbb {C}})</span>-MDP model, we propose a novel state value function to evaluate the policy, and a gradient ascent (GA) technique to improve the policy. We further show that offline GA-based policy iteration (GA-PI) can converge to global optima of <span>({mathbb {C}})</span>-MDP under certain conditions. Finally, with real-time information, the offline policy is used as the default plan, GA-PI is used to improve it and generate an online plan. Experimental results show that our offline policy reuse-guided Online_CMP method significantly outperforms standard online multiagent planning methods on MoD systems like ride-sharing and security traffic patrolling in terms of computation time and solution quality.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140966950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controller synthesis for linear temporal logic and steady-state specifications 线性时序逻辑和稳态规范的控制器合成
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-05-03 DOI: 10.1007/s10458-024-09648-7
Alvaro Velasquez, Ismail Alkhouri, Andre Beckus, Ashutosh Trivedi, George Atia
{"title":"Controller synthesis for linear temporal logic and steady-state specifications","authors":"Alvaro Velasquez,&nbsp;Ismail Alkhouri,&nbsp;Andre Beckus,&nbsp;Ashutosh Trivedi,&nbsp;George Atia","doi":"10.1007/s10458-024-09648-7","DOIUrl":"10.1007/s10458-024-09648-7","url":null,"abstract":"<div><p>The problem of deriving decision-making policies, subject to some formal specification of behavior, has been well-studied in the control synthesis, reinforcement learning, and planning communities. Such problems are typically framed in the context of a non-deterministic decision process, the non-determinism of which is optimally resolved by the computed policy. In this paper, we explore the derivation of such policies in Markov decision processes (MDPs) subject to two types of formal specifications. First, we consider steady-state specifications that reason about the infinite-frequency behavior of the resulting agent. This behavior corresponds to the frequency with which an agent visits each state as it follows its decision-making policy indefinitely. Second, we examine the infinite-trace behavior of the agent by imposing Linear Temporal Logic (LTL) constraints on the behavior induced by the resulting policy. We present an algorithm to find a deterministic policy satisfying LTL and steady-state constraints by characterizing the solutions as an integer linear program (ILP) and experimentally evaluate our approach. In our experimental results section, we evaluate the proposed ILP using MDPs with stochastic and deterministic transitions.\u0000</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mechanism design for public projects via three machine learning based approaches 通过三种基于机器学习的方法进行公共项目的机制设计
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-04-20 DOI: 10.1007/s10458-024-09647-8
Mingyu Guo, Diksha Goel, Guanhua Wang, Runqi Guo, Yuko Sakurai, Muhammad Ali Babar
{"title":"Mechanism design for public projects via three machine learning based approaches","authors":"Mingyu Guo,&nbsp;Diksha Goel,&nbsp;Guanhua Wang,&nbsp;Runqi Guo,&nbsp;Yuko Sakurai,&nbsp;Muhammad Ali Babar","doi":"10.1007/s10458-024-09647-8","DOIUrl":"10.1007/s10458-024-09647-8","url":null,"abstract":"&lt;div&gt;&lt;p&gt;We study mechanism design for nonexcludable and excludable binary public project problems. Our aim is to maximize the expected number of consumers and the expected agents’ welfare. We first show that for the nonexcludable public project model, there is no need for machine learning based mechanism design. We identify a sufficient condition on the prior distribution for the existing &lt;i&gt;conservative equal costs mechanism&lt;/i&gt; to be the optimal strategy-proof and individually rational mechanism. For general distributions, we propose a dynamic program that solves for the optimal mechanism. For the excludable public project model, we identify a similar sufficient condition for the existing &lt;i&gt;serial cost sharing mechanism&lt;/i&gt; to be optimal for 2 and 3 agents. We derive a numerical upper bound and use it to show that for several common distributions, the serial cost sharing mechanism is close to optimality. The serial cost sharing mechanism is not optimal in general. We propose three machine learning based approaches for designing better performing mechanisms. We focus on the family of &lt;i&gt;largest unanimous mechanisms&lt;/i&gt;, which characterizes all strategy-proof and individually rational mechanisms for the excludable public project model. A largest unanimous mechanism describes an &lt;i&gt;iterative&lt;/i&gt; mechanism, which is defined by an exponential number of mechanism parameters. Our first approach describes the largest unanimous mechanism family using a neural network and training is carried out by minimizing a cost function that combines the mechanism design objective and the constraint violation penalty. We interpret the largest unanimous mechanisms as price-oriented rationing-free (PORF) mechanisms, which enables us to move the mechanisms’ iterative decision making off the neural network, to a separate simulation process, therefore avoiding the &lt;i&gt;vanishing gradient&lt;/i&gt; problem. We also feed the prior distribution’s &lt;i&gt;analytical form&lt;/i&gt; into the cost function to achieve high-quality gradients for efficient training. Our second approach treats the mechanism design task as a &lt;i&gt;Markov Decision Process&lt;/i&gt; with an exponential number of states. During the Markov decision process, the non-consumers are gradually removed from the system. We train multiple neural networks, each for a different number of remaining agents, to learn the optimal value function on the states. Training is carried out by supervised learning toward a set of manually prepared base cases and the Bellman equation. Our third approach is based on &lt;i&gt;reinforcement learning&lt;/i&gt; for a &lt;i&gt;Partially Observable Markov Decision Process&lt;/i&gt;. Each RL episode randomly draws a type profile, which is hidden from the RL agent (mechanism designer). The RL agent only observes which cost share offers have been accepted under the largest unanimous mechanism under discussion. We use a continuous action space reinforcement learning approach to adjust the offer policy (i.e., adjust mechanism parameters).","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-024-09647-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward fast belief propagation for distributed constraint optimization problems via heuristic search 通过启发式搜索实现分布式约束优化问题的快速信念传播
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-04-01 DOI: 10.1007/s10458-024-09643-y
Junsong Gao, Ziyu Chen, Dingding Chen, Wenxin Zhang, Qiang Li
{"title":"Toward fast belief propagation for distributed constraint optimization problems via heuristic search","authors":"Junsong Gao,&nbsp;Ziyu Chen,&nbsp;Dingding Chen,&nbsp;Wenxin Zhang,&nbsp;Qiang Li","doi":"10.1007/s10458-024-09643-y","DOIUrl":"10.1007/s10458-024-09643-y","url":null,"abstract":"<div><p>Belief propagation (BP) approaches, such as Max-sum and its variants, are important methods to solve large-scale Distributed Constraint Optimization Problems. However, these algorithms face a huge challenge since their computational complexity scales exponentially with the arity of each constraint function. Current accelerating techniques for BP use sorting or branch-and-bound (BnB) strategy to reduce the search space. However, the existing BnB-based methods are mainly designed for specific problems, which limits their applicability. On the other hand, though several <i>generic</i> sorting-based methods have been proposed, they require significantly high preprocessing as well as memory overhead, which prohibits their adoption in some realistic scenarios. In this paper, we aim to propose a series of generic and memory-efficient heuristic search techniques to accelerate belief propagation. Specifically, by leveraging dynamic programming, we efficiently build function estimations for every partial assignment scoped in a constraint function in the preprocessing phase. Then, by using these estimations to build upper bounds and employing a branch-and-bound in a depth-first fashion to reduce the search space, we propose our first method called FDSP. Next, we enhance FDSP by adapting a concurrent-search strategy and leveraging the upper bounds as guiding information and propose its first heuristic variant framework called CONC-FDSP. Finally, by choosing to expand the partial assignment with the highest upper bound in each step of exploration, we propose the second heuristic variant of FDSP, called BFS-FDSP. We prove the correctness of our methods theoretically, and our empirical evaluations indicate their superiority for accelerating Max-sum in terms of both time and memory, compared with the state-of-the-art.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140355488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey of research on several problems in the RoboCup3D simulation environment RoboCup3D 模拟环境中若干问题的研究综述
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-03-26 DOI: 10.1007/s10458-024-09642-z
Zhongye Gao, Mengjun Yi, Ying Jin, Hanwen Zhang, Yun Hao, Ming Yin, Ziwen Cai, Furao Shen
{"title":"A survey of research on several problems in the RoboCup3D simulation environment","authors":"Zhongye Gao,&nbsp;Mengjun Yi,&nbsp;Ying Jin,&nbsp;Hanwen Zhang,&nbsp;Yun Hao,&nbsp;Ming Yin,&nbsp;Ziwen Cai,&nbsp;Furao Shen","doi":"10.1007/s10458-024-09642-z","DOIUrl":"10.1007/s10458-024-09642-z","url":null,"abstract":"<div><p>In the process of robot research and development, due to the vulnerability of hardware, simulation environment is often used to verify and test algorithms first. RoboCup3D simulation environment is developed based on open dynamic engine, and the humanoid robot NAO is modeled as the main robot, which provides a simulation platform for humanoid robot researchers to study robot movements. At the same time, it is also the official platform of RoboCup 3D events. Under the rules of soccer robot competition, it is helpful for the research of multi-robots, especially multi-humanoid robots’ cooperation strategy. This paper summarizes the related research in RoboCup3D simulation environment, and first introduces the basic problems existing in this simulation environment. Secondly, the research of robot motion generation and optimization based on model and non-model in simulation environment is introduced respectively. Then, it introduces the related research of cooperation strategy design of multi-humanoid robots under RoboCup3D rules, including positioning, dynamic role assignment, etc. And sort out a typical practical solution to the above problems; Finally, the future development trend of related research in RoboCup3D simulation environment is analyzed.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments 部分可观测环境中执行复杂任务的自主代理运动规划的无模型强化学习
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-03-26 DOI: 10.1007/s10458-024-09641-0
Junchao Li, Mingyu Cai, Zhen Kan, Shaoping Xiao
{"title":"Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments","authors":"Junchao Li,&nbsp;Mingyu Cai,&nbsp;Zhen Kan,&nbsp;Shaoping Xiao","doi":"10.1007/s10458-024-09641-0","DOIUrl":"10.1007/s10458-024-09641-0","url":null,"abstract":"<div><p>Motion planning of autonomous agents in partially known environments with incomplete information is a challenging problem, particularly for complex tasks. This paper proposes a model-free reinforcement learning approach to address this problem. We formulate motion planning as a probabilistic-labeled partially observable Markov decision process (PL-POMDP) problem and use linear temporal logic (LTL) to express the complex task. The LTL formula is then converted to a limit-deterministic generalized Büchi automaton (LDGBA). The problem is redefined as finding an optimal policy on the product of PL-POMDP with LDGBA based on model-checking techniques to satisfy the complex task. We implement deep Q learning with long short-term memory (LSTM) to process the observation history and task recognition. Our contributions include the proposed method, the utilization of LTL and LDGBA, and the LSTM-enhanced deep Q learning. We demonstrate the applicability of the proposed method by conducting simulations in various environments, including grid worlds, a virtual office, and a multi-agent warehouse. The simulation results demonstrate that our proposed method effectively addresses environment, action, and observation uncertainties. This indicates its potential for real-world applications, including the control of unmanned aerial vehicles.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and reinforcement learning in partially observable many-agent systems 部分可观测多代理系统的建模和强化学习
IF 2 3区 计算机科学
Autonomous Agents and Multi-Agent Systems Pub Date : 2024-03-26 DOI: 10.1007/s10458-024-09640-1
Keyang He, Prashant Doshi, Bikramjit Banerjee
{"title":"Modeling and reinforcement learning in partially observable many-agent systems","authors":"Keyang He,&nbsp;Prashant Doshi,&nbsp;Bikramjit Banerjee","doi":"10.1007/s10458-024-09640-1","DOIUrl":"10.1007/s10458-024-09640-1","url":null,"abstract":"<div><p>There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent’s individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents’ candidate behaviors based on (noisy) observations, thus enabling learning at the agent’s own level. We also address MARL’s prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140312895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信