Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems

IF 2.6 3区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

Autonomous Agents and Multi-Agent Systems Pub Date : 2024-05-16 DOI:10.1007/s10458-024-09650-z

Wanyuan Wang, Qian Che, Yifeng Zhou, Weiwei Wu, Bo An, Yichuan Jiang

{"title":"Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems","authors":"Wanyuan Wang, Qian Che, Yifeng Zhou, Weiwei Wu, Bo An, Yichuan Jiang","doi":"10.1007/s10458-024-09650-z","DOIUrl":null,"url":null,"abstract":"<div><p>The popularity of mobility-on-demand (MoD) systems boosts online collective multiagent planning (Online_CMP), where spatially distributed servicing agents are planned to meet dynamically arriving demands. For city-scale MoDs with a fleet of agents, Online_CMP methods must make a tradeoff between computation time (i.e., real-time) and solution quality (i.e., the number of demands served). Directly using an offline policy can guarantee real-time, but cannot be dynamically adjusted to real agent and demand distributions. Search-based online planning methods are adaptive, but are computationally expensive and cannot scale up. In this paper, we propose a principled Online_CMP method, which reuses and improves the offline policy in an anytime manner. We first model MoDs as a collective Markov Decision Process (<span>\\({\\mathbb {C}}\\)</span>-MDP) where the collective behavior of agents affects the joint reward. Given the <span>\\({\\mathbb {C}}\\)</span>-MDP model, we propose a novel state value function to evaluate the policy, and a gradient ascent (GA) technique to improve the policy. We further show that offline GA-based policy iteration (GA-PI) can converge to global optima of <span>\\({\\mathbb {C}}\\)</span>-MDP under certain conditions. Finally, with real-time information, the offline policy is used as the default plan, GA-PI is used to improve it and generate an online plan. Experimental results show that our offline policy reuse-guided Online_CMP method significantly outperforms standard online multiagent planning methods on MoD systems like ride-sharing and security traffic patrolling in terms of computation time and solution quality.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"38 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Agents and Multi-Agent Systems","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10458-024-09650-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The popularity of mobility-on-demand (MoD) systems boosts online collective multiagent planning (Online_CMP), where spatially distributed servicing agents are planned to meet dynamically arriving demands. For city-scale MoDs with a fleet of agents, Online_CMP methods must make a tradeoff between computation time (i.e., real-time) and solution quality (i.e., the number of demands served). Directly using an offline policy can guarantee real-time, but cannot be dynamically adjusted to real agent and demand distributions. Search-based online planning methods are adaptive, but are computationally expensive and cannot scale up. In this paper, we propose a principled Online_CMP method, which reuses and improves the offline policy in an anytime manner. We first model MoDs as a collective Markov Decision Process (\({\mathbb {C}}\)-MDP) where the collective behavior of agents affects the joint reward. Given the \({\mathbb {C}}\)-MDP model, we propose a novel state value function to evaluate the policy, and a gradient ascent (GA) technique to improve the policy. We further show that offline GA-based policy iteration (GA-PI) can converge to global optima of \({\mathbb {C}}\)-MDP under certain conditions. Finally, with real-time information, the offline policy is used as the default plan, GA-PI is used to improve it and generate an online plan. Experimental results show that our offline policy reuse-guided Online_CMP method significantly outperforms standard online multiagent planning methods on MoD systems like ride-sharing and security traffic patrolling in terms of computation time and solution quality.

Abstract Image

查看原文本刊更多论文

离线策略重用指导下的随时在线多代理集体规划及其在按需移动系统中的应用

按需移动（MoD）系统的流行促进了在线多代理集体规划（Online_CMP）的发展，在这种系统中，空间分布的服务代理被规划为满足动态到达的需求。对于拥有代理舰队的城市规模按需服务系统，在线多代理规划方法必须在计算时间（即实时性）和解决方案质量（即服务需求数量）之间做出权衡。直接使用离线策略可以保证实时性，但无法根据实际的代理和需求分布进行动态调整。基于搜索的在线规划方法具有自适应能力，但计算成本高且无法扩展。在本文中，我们提出了一种有原则的在线 CMP 方法，该方法可以随时重用和改进离线策略。我们首先将 MoDs 建模为一个集体马尔可夫决策过程（({\mathbb {C}}\)-MDP ），其中代理的集体行为会影响联合奖励。考虑到 \({\mathbb {C}}\)-MDP 模型，我们提出了一种新的状态值函数来评估策略，并提出了一种梯度上升（GA）技术来改进策略。我们进一步证明了基于离线 GA 的策略迭代（GA-PI）可以在一定条件下收敛到 \({\mathbb {C}}\)-MDP 的全局最优。最后，利用实时信息，将离线策略作为默认计划，利用 GA-PI 对其进行改进，生成在线计划。实验结果表明，我们的离线策略重用指导的在线CMP方法在计算时间和求解质量上明显优于标准的在线多代理规划方法，例如在共享乘车和安全交通巡逻等MoD系统上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Agents and Multi-Agent Systems 工程技术-计算机：人工智能

CiteScore

6.00

自引率

5.30%

发文量

审稿时长

>12 weeks

期刊介绍： This is the official journal of the International Foundation for Autonomous Agents and Multi-Agent Systems. It provides a leading forum for disseminating significant original research results in the foundations, theory, development, analysis, and applications of autonomous agents and multi-agent systems. Coverage in Autonomous Agents and Multi-Agent Systems includes, but is not limited to: Agent decision-making architectures and their evaluation, including: cognitive models; knowledge representation; logics for agency; ontological reasoning; planning (single and multi-agent); reasoning (single and multi-agent) Cooperation and teamwork, including: distributed problem solving; human-robot/agent interaction; multi-user/multi-virtual-agent interaction; coalition formation; coordination Agent communication languages, including: their semantics, pragmatics, and implementation; agent communication protocols and conversations; agent commitments; speech act theory Ontologies for agent systems, agents and the semantic web, agents and semantic web services, Grid-based systems, and service-oriented computing Agent societies and societal issues, including: artificial social systems; environments, organizations and institutions; ethical and legal issues; privacy, safety and security; trust, reliability and reputation Agent-based system development, including: agent development techniques, tools and environments; agent programming languages; agent specification or validation languages Agent-based simulation, including: emergent behavior; participatory simulation; simulation techniques, tools and environments; social simulation Agreement technologies, including: argumentation; collective decision making; judgment aggregation and belief merging; negotiation; norms Economic paradigms, including: auction and mechanism design; bargaining and negotiation; economically-motivated agents; game theory (cooperative and non-cooperative); social choice and voting Learning agents, including: computational architectures for learning agents; evolution, adaptation; multi-agent learning. Robotic agents, including: integrated perception, cognition, and action; cognitive robotics; robot planning (including action and motion planning); multi-robot systems. Virtual agents, including: agents in games and virtual environments; companion and coaching agents; modeling personality, emotions; multimodal interaction; verbal and non-verbal expressiveness Significant, novel applications of agent technology Comprehensive reviews and authoritative tutorials of research and practice in agent systems Comprehensive and authoritative reviews of books dealing with agents and multi-agent systems.