Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-08-19 DOI:10.1162/neco_a_01690

Theodore Jerome Tinker;Kenji Doya;Jun Tani

{"title":"Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle","authors":"Theodore Jerome Tinker;Kenji Doya;Jun Tani","doi":"10.1162/neco_a_01690","DOIUrl":null,"url":null,"abstract":"In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1854-1885"},"PeriodicalIF":2.7000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661269/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.

查看原文本刊更多论文

无观测噪声危害的内在探索奖励：基于自由能原理的模拟研究。

在强化学习（RL）中，人工代理被训练成通过执行任务来最大化数字奖励。在强化学习中，探索是必不可少的，因为代理必须先发现信息，然后再加以利用。行动策略的熵和对信息增益的好奇心是鼓励高效探索的两种奖励。熵在文献中已得到公认，可促进随机行动选择。好奇心在文献中的定义多种多样，它促进了新经验的发现。其中一个例子是预测错误好奇心，它奖励发现自己无法准确预测的观察结果的代理。然而，这些代理可能会被不可预测的观察噪音所干扰，这些噪音被称为 "好奇心陷阱"。基于自由能原理（FEP），这封信提出了隐态好奇心，它通过潜变量的预测先验概率和后验概率之间的 KL 分歧来奖励代理。我们训练了六种代理进行迷宫导航：没有熵或好奇心奖励的基准代理，以及有熵和/或预测误差好奇心或隐藏状态好奇心奖励的代理。我们发现，熵和好奇心能带来高效的探索，尤其是两者同时使用时。值得注意的是，具有隐藏状态好奇心的代理能够抵御好奇心陷阱，而好奇心陷阱会阻碍具有预测错误好奇心的代理。这表明，实施 FEP 可以增强 RL 模型的稳健性和泛化能力，从而有可能使人工和生物代理的学习过程相一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.