Learning to Reach Optimal Equilibrium by Influence of Other Agents Opinion

D. Barrios-Aranibar, L. Gonçalves
{"title":"Learning to Reach Optimal Equilibrium by Influence of Other Agents Opinion","authors":"D. Barrios-Aranibar, L. Gonçalves","doi":"10.1109/HIS.2007.61","DOIUrl":null,"url":null,"abstract":"In this work authors extend the model of the reinforcement learning paradigm for multi-agent systems called \"influence value reinforcement learning \" (IVRL). In previous work an algorithm for repetitive games was proposed, and it outperformed traditional paradigms. Here, authors define an algorithm based on this paradigm for using when agents has to learn from delayed rewards, thus, an influence value reinforcement learning algorithm for two agents stochastic games. The IVRLparadigm is based on social interaction of people, specially in the fact that people communicate each other what they think about their actions and this opinion has some influence in the behavior of each other. A modified version of Q-learning algorithm using this paradigm was constructed. The so called TV Q-learning algorithm was implemented and compared with versions of Q-learning for independent learning and joint action learning. Our approach shows to have more probability to converge to an optimal equilibrium than IQ-learning and JAQ-learning algorithms, specially when exploration increases.","PeriodicalId":359991,"journal":{"name":"7th International Conference on Hybrid Intelligent Systems (HIS 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"7th International Conference on Hybrid Intelligent Systems (HIS 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIS.2007.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this work authors extend the model of the reinforcement learning paradigm for multi-agent systems called "influence value reinforcement learning " (IVRL). In previous work an algorithm for repetitive games was proposed, and it outperformed traditional paradigms. Here, authors define an algorithm based on this paradigm for using when agents has to learn from delayed rewards, thus, an influence value reinforcement learning algorithm for two agents stochastic games. The IVRLparadigm is based on social interaction of people, specially in the fact that people communicate each other what they think about their actions and this opinion has some influence in the behavior of each other. A modified version of Q-learning algorithm using this paradigm was constructed. The so called TV Q-learning algorithm was implemented and compared with versions of Q-learning for independent learning and joint action learning. Our approach shows to have more probability to converge to an optimal equilibrium than IQ-learning and JAQ-learning algorithms, specially when exploration increases.
学习在其他主体意见的影响下达到最优均衡
在这项工作中,作者扩展了多智能体系统的强化学习范式模型,称为“影响价值强化学习”(IVRL)。在以前的工作中,提出了一种重复游戏的算法,它优于传统的范式。在此,作者定义了一种基于此范式的算法,用于智能体必须从延迟奖励中学习时,因此,这是一种针对两个智能体随机博弈的影响值强化学习算法。ivrl范式基于人们的社会互动,特别是人们相互交流他们对自己行为的看法,这种看法对彼此的行为有一定的影响。在此基础上构造了一个改进版的Q-learning算法。实现了所谓的TV Q-learning算法,并与独立学习和联合动作学习的Q-learning版本进行了比较。我们的方法显示出比IQ-learning和JAQ-learning算法更有可能收敛到最优平衡,特别是当探索增加时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信