Twin Delayed Hierarchical Actor-Critic

2021 7th International Conference on Automation, Robotics and Applications (ICARA) Pub Date : 2021-02-04 DOI:10.1109/ICARA51699.2021.9376459

M. Anca, M. Studley

引用次数: 3

Abstract

Hierarchical Reinforcement Learning (HRL) addresses the common problem in sparse rewards environments of having to manually craft a reward function. We present a modified version of the Hierarchical Actor-Critic (HAC) architecture called Twin Delayed HAC (TDHAC), a method capable of sample-efficient learning on environments requiring object interaction. The vanilla algorithm fails to converge on this type of environment, while our method matches the best results so far reported in the literature. We carefully consider each feature added to the original architecture and demonstrate the abilities of TDHAC on the sparse-reward Pick-and-Place environment. To the best of our knowledge, this is the first HRL algorithm successfully applied on an environment requiring object interaction without external enhancements such as demonstrations.

查看原文本刊更多论文

双延迟等级行为批评家

分层强化学习(HRL)解决了在稀疏奖励环境中必须手动创建奖励函数的常见问题。我们提出了一种改进版本的分层参与者-评论家(HAC)架构，称为双延迟HAC (TDHAC)，一种能够在需要对象交互的环境中进行样本高效学习的方法。香草算法无法在这种类型的环境中收敛，而我们的方法与迄今为止文献中报道的最佳结果相匹配。我们仔细考虑了添加到原始架构中的每个功能，并演示了TDHAC在稀疏奖励拾取和放置环境中的能力。据我们所知，这是第一个成功应用于需要对象交互的环境的HRL算法，无需外部增强(如演示)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 7th International Conference on Automation, Robotics and Applications (ICARA)

自引率

0.00%

发文量