Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs

IEEE open journal of control systems Pub Date : 2022-08-16 DOI:10.1109/OJCSYS.2022.3198660

Jayanth Reddy Regatti;Abhishek Gupta

引用次数: 0

Abstract

In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an

$\epsilon$

-optimal Q function (state-action value function) using

$O(1/\epsilon ^{4})$

i.i.d. samples of state-action-reward-next state tuples.

查看原文本刊更多论文

一般MDP离线强化学习Minmax变量的有限样本分析

在这项工作中，我们分析了具有一般状态、一般函数空间和状态相关动作集的离线强化学习的有限样本复杂度边界。与早期的工作相比，所分析的算法不需要数据收集策略的知识。我们证明了可以使用状态动作奖励下一个状态元组的$O（1/\epsilon^{4}）$i.i.d.样本来计算$\epsilon$最优Q函数（状态动作值函数）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE open journal of control systems

自引率

0.00%

发文量