A3DQN: Adaptive Anderson Acceleration for Deep Q-Networks

2020 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2020-12-01 DOI:10.1109/SSCI47803.2020.9308288

Melike Ermis, Insoon Yang

引用次数: 8

Abstract

Reinforcement learning (RL) has been used for an agent to learn efficient decision-making strategies through its interactions with an environment. However, slow convergence and sample inefficiency of RL algorithms make them impractical for complex real-world problems. In this paper, we present an acceleration scheme, called Anderson acceleration (AA), for RL, where the value function in the next iteration is calculated using a linear combination of value functions in the previous iterations. Since the original AA method suffers from instability, we consider adaptive Anderson acceleration (A3) as a stabilized variant of AA, which contains both adaptive regularization to handle instability and safeguarding to enhance performance. We first apply A3 to value iteration for Q-functions and show its convergence property. To extend the idea of A3 to model-free deep RL, we devise a simple variant of deep Q-networks (DQN). Our experiments on the Atari 2600 benchmark demonstrate that the proposed method outperforms double DQN in terms of both final performance and learning speed.

查看原文本刊更多论文

深度q -网络的自适应安德森加速

强化学习(RL)已被用于智能体通过与环境的交互来学习有效的决策策略。然而，RL算法的缓慢收敛和样本效率低下使得它们在复杂的现实问题中不切实际。在本文中，我们提出了一种加速方案，称为安德森加速(AA)，用于RL，其中下一次迭代中的值函数是使用前一次迭代中的值函数的线性组合来计算的。由于原有的AA方法存在不稳定性，我们将自适应安德森加速(A3)作为AA的稳定变体，它包含了处理不稳定性的自适应正则化和提高性能的保障。首先将A3应用于q函数的值迭代，并证明了其收敛性。为了将A3的思想扩展到无模型深度强化学习，我们设计了一个简单的深度q网络(DQN)变体。我们在Atari 2600基准上的实验表明，所提出的方法在最终性能和学习速度方面都优于双DQN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量