Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration

IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans Pub Date : 2021-09-01 DOI:10.1109/TSMC.2019.2957051

Hangkai Hu, Shiji Song, Gao Huang

{"title":"Self-Attention-Based Temporary Curiosity in Reinforcement Learning Exploration","authors":"Hangkai Hu, Shiji Song, Gao Huang","doi":"10.1109/TSMC.2019.2957051","DOIUrl":null,"url":null,"abstract":"In many real-world scenarios, extrinsic rewards provided by the environment are sparse. An agent trained with classic reinforcement learning algorithm fails to explore these environments in a sufficient and effective way. To address this problem, the exploration bonus which derives from environmental novelty serves as intrinsic motivation for the agent. In recent years, curiosity-driven exploration is a mainstream approach to describe environmental novelty through prediction errors of dynamics models. Due to the expressive ability limitations of curiosity-based environmental novelty and the difficulty of finding appropriate feature space, most curiosity-driven exploration methods have the problem of overprotection against repetition. This problem can reduce the efficiency of exploration and lead the agent into a trap with local optimality. In this article, we propose a combination of persisting curiosity and temporary curiosity framework to deal with the problem of overprotection against repetition. We introduce the self-attention mechanism from the field of computer vision and propose a sequence-based self-attention mechanism for temporary curiosity generation. We compare our framework with some previous exploration methods in hard-exploration environments, provide a series of comprehensive analysis of the proposed framework and investigate the effect of the individual components of our method. The experimental results indicate that the proposed framework delivers superior performance than existing methods.","PeriodicalId":55007,"journal":{"name":"IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans","volume":"41 1","pages":"5773-5784"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSMC.2019.2957051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In many real-world scenarios, extrinsic rewards provided by the environment are sparse. An agent trained with classic reinforcement learning algorithm fails to explore these environments in a sufficient and effective way. To address this problem, the exploration bonus which derives from environmental novelty serves as intrinsic motivation for the agent. In recent years, curiosity-driven exploration is a mainstream approach to describe environmental novelty through prediction errors of dynamics models. Due to the expressive ability limitations of curiosity-based environmental novelty and the difficulty of finding appropriate feature space, most curiosity-driven exploration methods have the problem of overprotection against repetition. This problem can reduce the efficiency of exploration and lead the agent into a trap with local optimality. In this article, we propose a combination of persisting curiosity and temporary curiosity framework to deal with the problem of overprotection against repetition. We introduce the self-attention mechanism from the field of computer vision and propose a sequence-based self-attention mechanism for temporary curiosity generation. We compare our framework with some previous exploration methods in hard-exploration environments, provide a series of comprehensive analysis of the proposed framework and investigate the effect of the individual components of our method. The experimental results indicate that the proposed framework delivers superior performance than existing methods.

查看原文本刊更多论文

强化学习探索中基于自我注意的暂时好奇心

在许多现实场景中，环境提供的外部奖励是稀疏的。使用经典强化学习算法训练的智能体无法以充分有效的方式探索这些环境。为了解决这一问题，由环境新颖性产生的探索奖励作为智能体的内在动机。近年来，好奇心驱动的探索是通过动力学模型的预测误差来描述环境新颖性的主流方法。由于基于好奇心的环境新颖性的表达能力限制以及寻找合适的特征空间的困难，大多数好奇心驱动的探索方法都存在过度保护防止重复的问题。这个问题会降低智能体的搜索效率，使其陷入局部最优的陷阱。在本文中，我们提出了一个持续好奇心和临时好奇心相结合的框架来处理过度保护的问题。我们从计算机视觉领域引入了自注意机制，提出了一种基于序列的暂时好奇心产生的自注意机制。我们将我们的框架与以前在硬勘探环境下的一些勘探方法进行了比较，对所提出的框架进行了一系列综合分析，并研究了我们方法中各个组成部分的效果。实验结果表明，该框架比现有方法具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans 工程技术-计算机：控制论

自引率

0.00%

发文量

审稿时长

6.0 months

期刊介绍： The scope of the IEEE Transactions on Systems, Man, and Cybernetics: Systems includes the fields of systems engineering. It includes issue formulation, analysis and modeling, decision making, and issue interpretation for any of the systems engineering lifecycle phases associated with the definition, development, and deployment of large systems. In addition, it includes systems management, systems engineering processes, and a variety of systems engineering methods such as optimization, modeling and simulation.