Achieving Socially Optimal Outcomes in Multiagent Systems with Reinforcement Social Learning

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Autonomous and Adaptive Systems Pub Date : 2013-09-01 DOI:10.1145/2517329

Jianye Hao, Ho-fung Leung

{"title":"Achieving Socially Optimal Outcomes in Multiagent Systems with Reinforcement Social Learning","authors":"Jianye Hao, Ho-fung Leung","doi":"10.1145/2517329","DOIUrl":null,"url":null,"abstract":"In multiagent systems, social optimality is a desirable goal to achieve in terms of maximizing the global efficiency of the system. We study the problem of coordinating on socially optimal outcomes among a population of agents, in which each agent randomly interacts with another agent from the population each round. Previous work [Hales and Edmonds 2003; Matlock and Sen 2007, 2009] mainly resorts to modifying the interaction protocol from random interaction to tag-based interactions and only focus on the case of symmetric games. Besides, in previous work the agents’ decision making processes are usually based on evolutionary learning, which usually results in high communication cost and high deviation on the coordination rate. To solve these problems, we propose an alternative social learning framework with two major contributions as follows. First, we introduce the observation mechanism to reduce the amount of communication required among agents. Second, we propose that the agents’ learning strategies should be based on reinforcement learning technique instead of evolutionary learning. Each agent explicitly keeps the record of its current state in its learning strategy, and learn its optimal policy for each state independently. In this way, the learning performance is much more stable and also it is suitable for both symmetric and asymmetric games. The performance of this social learning framework is extensively evaluated under the testbed of two-player general-sum games comparing with previous work [Hao and Leung 2011; Matlock and Sen 2007]. The influences of different factors on the learning performance of the social learning framework are investigated as well.","PeriodicalId":50919,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems","volume":"364 1","pages":"15:1-15:23"},"PeriodicalIF":2.2000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Autonomous and Adaptive Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2517329","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 16

Abstract

In multiagent systems, social optimality is a desirable goal to achieve in terms of maximizing the global efficiency of the system. We study the problem of coordinating on socially optimal outcomes among a population of agents, in which each agent randomly interacts with another agent from the population each round. Previous work [Hales and Edmonds 2003; Matlock and Sen 2007, 2009] mainly resorts to modifying the interaction protocol from random interaction to tag-based interactions and only focus on the case of symmetric games. Besides, in previous work the agents’ decision making processes are usually based on evolutionary learning, which usually results in high communication cost and high deviation on the coordination rate. To solve these problems, we propose an alternative social learning framework with two major contributions as follows. First, we introduce the observation mechanism to reduce the amount of communication required among agents. Second, we propose that the agents’ learning strategies should be based on reinforcement learning technique instead of evolutionary learning. Each agent explicitly keeps the record of its current state in its learning strategy, and learn its optimal policy for each state independently. In this way, the learning performance is much more stable and also it is suitable for both symmetric and asymmetric games. The performance of this social learning framework is extensively evaluated under the testbed of two-player general-sum games comparing with previous work [Hao and Leung 2011; Matlock and Sen 2007]. The influences of different factors on the learning performance of the social learning framework are investigated as well.

查看原文本刊更多论文

用强化社会学习实现多智能体系统的社会最优结果

在多智能体系统中，社会最优性是实现系统全局效率最大化的理想目标。我们研究了智能体群体中社会最优结果的协调问题，其中每个智能体每轮随机与群体中的另一个智能体相互作用。以前的工作[Hales and Edmonds 2003;Matlock and Sen 2007, 2009]主要是将交互协议从随机交互修改为基于标签的交互，并且只关注对称博弈的情况。此外，在以往的工作中，智能体的决策过程通常是基于进化学习的，这通常会导致高通信成本和高协调率偏差。为了解决这些问题，我们提出了一个替代的社会学习框架，主要贡献如下:首先，我们引入了观察机制，以减少代理之间所需的通信量。其次，我们提出智能体的学习策略应该基于强化学习技术而不是进化学习。每个智能体显式地在其学习策略中保存其当前状态的记录，并独立地学习每个状态的最优策略。这样，学习性能更加稳定，并且适合于对称和非对称博弈。与之前的研究相比[Hao and Leung 2011;Matlock and Sen 2007]。研究了不同因素对社会学习框架学习绩效的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Autonomous and Adaptive Systems 工程技术-计算机：理论方法

CiteScore

4.80

自引率

7.40%

发文量

审稿时长

>12 weeks

期刊介绍： TAAS addresses research on autonomous and adaptive systems being undertaken by an increasingly interdisciplinary research community -- and provides a common platform under which this work can be published and disseminated. TAAS encourages contributions aimed at supporting the understanding, development, and control of such systems and of their behaviors. TAAS addresses research on autonomous and adaptive systems being undertaken by an increasingly interdisciplinary research community - and provides a common platform under which this work can be published and disseminated. TAAS encourages contributions aimed at supporting the understanding, development, and control of such systems and of their behaviors. Contributions are expected to be based on sound and innovative theoretical models, algorithms, engineering and programming techniques, infrastructures and systems, or technological and application experiences.