Distributing rewards by strategic knowledge based on Nash-Q learning

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) Pub Date : 2008-10-31 DOI:10.1109/ICADIWT.2008.4664393

Kazuo Igoshi, T. Miura, I. Shioya

{"title":"Distributing rewards by strategic knowledge based on Nash-Q learning","authors":"Kazuo Igoshi, T. Miura, I. Shioya","doi":"10.1109/ICADIWT.2008.4664393","DOIUrl":null,"url":null,"abstract":"In this investigation, we examine collaboration approach to reward distribution in repeated general-sum stochastic games by multiple game players in terms of position and rewards. There have been several investigation of reward distribution discussed so far, and reinforcement has been considered useful since no knowledge is needed in advanced and better decision can be extracted while learning. Among others, Q-learning has been paid much attention under single agent environment. However, under multi-agent environment, we donpsilat have sharp targets to this problem, what is the most optimal principle? In this work, we discuss how to distribute reward thoroughly by considering as general stochastic games based on theory of games. That is, we introduce Nash-Q approach which combines Nash equilibrium with Q-learning. We show the new approach provides us with new strategic solution. We discuss some experiments of rather complicated games (game of life) to see the usefulness of the approach.","PeriodicalId":189871,"journal":{"name":"2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICADIWT.2008.4664393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this investigation, we examine collaboration approach to reward distribution in repeated general-sum stochastic games by multiple game players in terms of position and rewards. There have been several investigation of reward distribution discussed so far, and reinforcement has been considered useful since no knowledge is needed in advanced and better decision can be extracted while learning. Among others, Q-learning has been paid much attention under single agent environment. However, under multi-agent environment, we donpsilat have sharp targets to this problem, what is the most optimal principle? In this work, we discuss how to distribute reward thoroughly by considering as general stochastic games based on theory of games. That is, we introduce Nash-Q approach which combines Nash equilibrium with Q-learning. We show the new approach provides us with new strategic solution. We discuss some experiments of rather complicated games (game of life) to see the usefulness of the approach.

查看原文本刊更多论文

基于Nash-Q学习的策略性知识分配奖励

在本研究中，我们从位置和奖励的角度研究了多博弈参与者在重复一般和随机博弈中的奖励分配的合作方法。到目前为止，已经有一些关于奖励分配的研究，强化被认为是有用的，因为在高级阶段不需要知识，并且可以在学习过程中提取更好的决策。其中，单智能体环境下的q学习备受关注。然而，在多智能体环境下，我们对这个问题没有明确的目标，什么是最优的原则?本文从博弈论的角度出发，讨论了如何将奖励分配看作一般随机博弈。也就是说，我们引入了纳什均衡与q学习相结合的纳什- q方法。我们展示了新方法为我们提供了新的战略解决方案。我们讨论了一些相当复杂的游戏(生命游戏)的实验，以了解该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT)

自引率

0.00%

发文量