Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis

IF 2.6 Q1 MATHEMATICS, APPLIED

SIAM journal on mathematics of data science Pub Date : 2022-06-01 DOI:10.1137/20m1364436

Markus Böck, C. Heitzinger

引用次数: 2

Abstract

. In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cram´er distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.

查看原文本刊更多论文

快速分类分布强化学习与复杂性分析

．在分布式强化学习中，建模的是整个收益的分布，而不仅仅是预期收益。在q学习中，以分类分布作为近似方法的方法是众所周知的，并且在表格情况下已经建立了收敛结果。在这项工作中，快速q -学习扩展到分类分布，执行有限时间分析，并根据克拉姆距离建立了可能近似正确的界限。结果表明，在分布式情况下，与标准q -学习规则相比，新的更新规则产生更快的策略评估，并且样本复杂性本质上与基于值的算法相同。不需要更多的状态-行动-奖励样本，就可以通过分类分布获得更多关于回报的信息。尽管结果不容易扩展到策略控制的情况，但对更新规则的稍微修改会产生有希望的数值结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SIAM journal on mathematics of data science

自引率

0.00%

发文量