A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

2021 Seventh Indian Control Conference (ICC) Pub Date : 2021-10-27 DOI:10.1109/ICC54714.2021.9702912

Gugan Thoppe, Bhumesh Kumar

引用次数: 3

Abstract

In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable.

查看原文本刊更多论文

多智能体强化学习的迭代对数律

在多智能体强化学习(MARL)中，多个智能体与一个共同的环境相互作用，也相互作用，以解决顺序决策中的共享问题。在这项工作中，我们为一组分布非线性随机逼近格式导出了一种新的迭代对数律，它在MARL中很有用。特别是，我们的结果描述了算法收敛的几乎每个样本路径上的收敛率。该结果是分布式设置中的第一个此类结果，并且比现有的结果提供了更深入的见解，这些结果只讨论了预期或CLT意义上的收敛速度。重要的是，我们的结果在明显较弱的假设下成立:八卦矩阵既不需要双重随机，也不需要步长平方可和。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Seventh Indian Control Conference (ICC)

自引率

0.00%

发文量