Learning to Reach Optimal Equilibrium by Influence of Other Agents Opinion

7th International Conference on Hybrid Intelligent Systems (HIS 2007) Pub Date : 2007-09-17 DOI:10.1109/HIS.2007.61

D. Barrios-Aranibar, L. Gonçalves

{"title":"Learning to Reach Optimal Equilibrium by Influence of Other Agents Opinion","authors":"D. Barrios-Aranibar, L. Gonçalves","doi":"10.1109/HIS.2007.61","DOIUrl":null,"url":null,"abstract":"In this work authors extend the model of the reinforcement learning paradigm for multi-agent systems called \"influence value reinforcement learning \" (IVRL). In previous work an algorithm for repetitive games was proposed, and it outperformed traditional paradigms. Here, authors define an algorithm based on this paradigm for using when agents has to learn from delayed rewards, thus, an influence value reinforcement learning algorithm for two agents stochastic games. The IVRLparadigm is based on social interaction of people, specially in the fact that people communicate each other what they think about their actions and this opinion has some influence in the behavior of each other. A modified version of Q-learning algorithm using this paradigm was constructed. The so called TV Q-learning algorithm was implemented and compared with versions of Q-learning for independent learning and joint action learning. Our approach shows to have more probability to converge to an optimal equilibrium than IQ-learning and JAQ-learning algorithms, specially when exploration increases.","PeriodicalId":359991,"journal":{"name":"7th International Conference on Hybrid Intelligent Systems (HIS 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"7th International Conference on Hybrid Intelligent Systems (HIS 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIS.2007.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In this work authors extend the model of the reinforcement learning paradigm for multi-agent systems called "influence value reinforcement learning " (IVRL). In previous work an algorithm for repetitive games was proposed, and it outperformed traditional paradigms. Here, authors define an algorithm based on this paradigm for using when agents has to learn from delayed rewards, thus, an influence value reinforcement learning algorithm for two agents stochastic games. The IVRLparadigm is based on social interaction of people, specially in the fact that people communicate each other what they think about their actions and this opinion has some influence in the behavior of each other. A modified version of Q-learning algorithm using this paradigm was constructed. The so called TV Q-learning algorithm was implemented and compared with versions of Q-learning for independent learning and joint action learning. Our approach shows to have more probability to converge to an optimal equilibrium than IQ-learning and JAQ-learning algorithms, specially when exploration increases.

查看原文本刊更多论文

学习在其他主体意见的影响下达到最优均衡

在这项工作中，作者扩展了多智能体系统的强化学习范式模型，称为“影响价值强化学习”(IVRL)。在以前的工作中，提出了一种重复游戏的算法，它优于传统的范式。在此，作者定义了一种基于此范式的算法，用于智能体必须从延迟奖励中学习时，因此，这是一种针对两个智能体随机博弈的影响值强化学习算法。ivrl范式基于人们的社会互动，特别是人们相互交流他们对自己行为的看法，这种看法对彼此的行为有一定的影响。在此基础上构造了一个改进版的Q-learning算法。实现了所谓的TV Q-learning算法，并与独立学习和联合动作学习的Q-learning版本进行了比较。我们的方法显示出比IQ-learning和JAQ-learning算法更有可能收敛到最优平衡，特别是当探索增加时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

7th International Conference on Hybrid Intelligent Systems (HIS 2007)

自引率

0.00%

发文量