A multi-step minimax Q-learning algorithm for two-player zero-sum Markov games

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-09-16 DOI:10.1016/j.neucom.2025.131552

Shreyas S.R. , Antony Vijesh

引用次数: 0

Abstract

An interesting iterative procedure is proposed to solve two-player zero-sum Markov games. Under suitable assumptions, the boundedness of the proposed iterates is obtained theoretically. Using results from stochastic approximation, the almost sure convergence of the proposed multi-step minimax Q-learning is obtained theoretically. More specifically, the proposed algorithm converges to the game theoretic optimal value with probability one, when the model information is not known. Numerical simulations authenticate that the proposed algorithm is effective and easy to implement.

查看原文本刊更多论文

二人零和马尔可夫博弈的多步极大极小q学习算法

提出了一种求解二人零和马尔可夫博弈的有趣迭代方法。在适当的假设条件下，从理论上得到了所提迭代的有界性。利用随机逼近的结果，从理论上得到了所提出的多步极大极小q学习的几乎肯定收敛性。更具体地说，当模型信息未知时，算法以概率1收敛到博弈论最优值。数值仿真验证了该算法的有效性和易实现性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.