Knowledge transfer in multi-objective multi-agent reinforcement learning via generalized policy improvement

IF 1.2 4区计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science and Information Systems Pub Date : 2023-01-01 DOI:10.2298/csis221210071a

Almeida de, Lucas Alegre, Ana Bazzan

{"title":"Knowledge transfer in multi-objective multi-agent reinforcement learning via generalized policy improvement","authors":"Almeida de, Lucas Alegre, Ana Bazzan","doi":"10.2298/csis221210071a","DOIUrl":null,"url":null,"abstract":"Even though many real-world problems are inherently distributed and multi-objective, most of the reinforcement learning (RL) literature deals with single agents and single objectives. While some of these problems can be solved using a single-agent single-objective RL solution (e.g., by specifying preferences over objectives), there are robustness issues, as well the fact that preferences may change over time, or it might not even be possible to set such preferences. Therefore, a need arises for a way to train multiple agents for any given preference distribution over the objectives. This work thus proposes a multi-objective multi-agent reinforcement learning (MOMARL) method in which agents build a shared set of policies during training, in a decentralized way, and then combine these policies using a generalization of policy improvement and policy evaluation (fundamental operations of RL algorithms) to generate effective behaviors for any possible preference distribution, without requiring any additional training. This method is applied to two different application scenarios: a multi-agent extension of a domain commonly used in the related literature, and traffic signal control, which is more complex, inherently distributed and multi-objective (the flow of both vehicles and pedestrians are considered). Results show that the approach is able to effectively and efficiently generate behaviors for the agents, given any preference over the objectives.","PeriodicalId":50636,"journal":{"name":"Computer Science and Information Systems","volume":"45 1","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science and Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2298/csis221210071a","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Even though many real-world problems are inherently distributed and multi-objective, most of the reinforcement learning (RL) literature deals with single agents and single objectives. While some of these problems can be solved using a single-agent single-objective RL solution (e.g., by specifying preferences over objectives), there are robustness issues, as well the fact that preferences may change over time, or it might not even be possible to set such preferences. Therefore, a need arises for a way to train multiple agents for any given preference distribution over the objectives. This work thus proposes a multi-objective multi-agent reinforcement learning (MOMARL) method in which agents build a shared set of policies during training, in a decentralized way, and then combine these policies using a generalization of policy improvement and policy evaluation (fundamental operations of RL algorithms) to generate effective behaviors for any possible preference distribution, without requiring any additional training. This method is applied to two different application scenarios: a multi-agent extension of a domain commonly used in the related literature, and traffic signal control, which is more complex, inherently distributed and multi-objective (the flow of both vehicles and pedestrians are considered). Results show that the approach is able to effectively and efficiently generate behaviors for the agents, given any preference over the objectives.

查看原文本刊更多论文

基于广义策略改进的多目标多智能体强化学习中的知识转移

尽管许多现实世界的问题本质上是分布式和多目标的，但大多数强化学习(RL)文献都是处理单个智能体和单个目标的。虽然其中一些问题可以使用单智能体单目标RL解决方案来解决(例如，通过指定偏好而不是目标)，但存在鲁棒性问题，以及偏好可能随着时间而改变的事实，或者甚至可能无法设置这种偏好。因此，需要一种方法来训练多个代理针对任何给定的偏好分布的目标。因此，这项工作提出了一种多目标多智能体强化学习(MOMARL)方法，其中智能体在训练期间以分散的方式构建一组共享策略，然后使用策略改进和策略评估的泛化(RL算法的基本操作)将这些策略组合在一起，以生成针对任何可能的偏好分布的有效行为，而无需任何额外的训练。该方法应用于两种不同的应用场景:一种是相关文献中常用的领域的多智能体扩展，另一种是交通信号控制，它更复杂，固有分布和多目标(同时考虑车辆和行人的流动)。结果表明，该方法能够有效地生成代理的行为，给定任何优于目标的偏好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Science and Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

2.30

自引率

21.40%

发文量

审稿时长

7.5 months

期刊介绍： About the journal Home page Contact information Aims and scope Indexing information Editorial policies ComSIS consortium Journal boards Managing board For authors Information for contributors Paper submission Article submission through OJS Copyright transfer form Download section For readers Forthcoming articles Current issue Archive Subscription For reviewers View and review submissions News Journal''s Facebook page Call for special issue New issue notification Aims and scope Computer Science and Information Systems (ComSIS) is an international refereed journal, published in Serbia. The objective of ComSIS is to communicate important research and development results in the areas of computer science, software engineering, and information systems.