Deconstructing Deep Active Inference: A Contrarian Information Gatherer

IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Théophile Champion;Marek Grześ;Lisa Bonheme;Howard Bowman
{"title":"Deconstructing Deep Active Inference: A Contrarian Information Gatherer","authors":"Théophile Champion;Marek Grześ;Lisa Bonheme;Howard Bowman","doi":"10.1162/neco_a_01697","DOIUrl":null,"url":null,"abstract":"Active inference is a theory of perception, learning, and decision making that can be applied to neuroscience, robotics, psychology, and machine learning. Recently, intensive research has been taking place to scale up this framework using Monte Carlo tree search and deep learning. The goal of this activity is to solve more complicated tasks using deep active inference. First, we review the existing literature and then progressively build a deep active inference agent as follows: we (1) implement a variational autoencoder (VAE), (2) implement a deep hidden Markov model (HMM), and (3) implement a deep critical hidden Markov model (CHMM). For the CHMM, we implemented two versions, one minimizing expected free energy, CHMM[EFE] and one maximizing rewards, CHMM[reward]. Then we experimented with three different action selection strategies: the ε-greedy algorithm as well as softmax and best action selection. According to our experiments, the models able to solve the dSprites environment are the ones that maximize rewards. On further inspection, we found that the CHMM minimizing expected free energy almost always picks the same action, which makes it unable to solve the dSprites environment. In contrast, the CHMM maximizing reward keeps on selecting all the actions, enabling it to successfully solve the task. The only difference between those two CHMMs is the epistemic value, which aims to make the outputs of the transition and encoder networks as close as possible. Thus, the CHMM minimizing expected free energy repeatedly picks a single action and becomes an expert at predicting the future when selecting this action. This effectively makes the KL divergence between the output of the transition and encoder networks small. Additionally, when selecting the action down the average reward is zero, while for all the other actions, the expected reward will be negative. Therefore, if the CHMM has to stick to a single action to keep the KL divergence small, then the action down is the most rewarding. We also show in simulation that the epistemic value used in deep active inference can behave degenerately and in certain circumstances effectively lose, rather than gain, information. As the agent minimizing EFE is not able to explore its environment, the appropriate formulation of the epistemic value in deep active inference remains an open question.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 11","pages":"2403-2445"},"PeriodicalIF":2.7000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10810346/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Active inference is a theory of perception, learning, and decision making that can be applied to neuroscience, robotics, psychology, and machine learning. Recently, intensive research has been taking place to scale up this framework using Monte Carlo tree search and deep learning. The goal of this activity is to solve more complicated tasks using deep active inference. First, we review the existing literature and then progressively build a deep active inference agent as follows: we (1) implement a variational autoencoder (VAE), (2) implement a deep hidden Markov model (HMM), and (3) implement a deep critical hidden Markov model (CHMM). For the CHMM, we implemented two versions, one minimizing expected free energy, CHMM[EFE] and one maximizing rewards, CHMM[reward]. Then we experimented with three different action selection strategies: the ε-greedy algorithm as well as softmax and best action selection. According to our experiments, the models able to solve the dSprites environment are the ones that maximize rewards. On further inspection, we found that the CHMM minimizing expected free energy almost always picks the same action, which makes it unable to solve the dSprites environment. In contrast, the CHMM maximizing reward keeps on selecting all the actions, enabling it to successfully solve the task. The only difference between those two CHMMs is the epistemic value, which aims to make the outputs of the transition and encoder networks as close as possible. Thus, the CHMM minimizing expected free energy repeatedly picks a single action and becomes an expert at predicting the future when selecting this action. This effectively makes the KL divergence between the output of the transition and encoder networks small. Additionally, when selecting the action down the average reward is zero, while for all the other actions, the expected reward will be negative. Therefore, if the CHMM has to stick to a single action to keep the KL divergence small, then the action down is the most rewarding. We also show in simulation that the epistemic value used in deep active inference can behave degenerately and in certain circumstances effectively lose, rather than gain, information. As the agent minimizing EFE is not able to explore its environment, the appropriate formulation of the epistemic value in deep active inference remains an open question.
解构深度主动推理:逆向信息收集器
主动推理是一种感知、学习和决策理论,可应用于神经科学、机器人学、心理学和机器学习。最近,人们正在利用蒙特卡洛树搜索和深度学习进行深入研究,以扩大这一框架的规模。这项活动的目标是利用深度主动推理解决更复杂的任务。首先,我们回顾了现有文献,然后逐步建立了一个深度主动推理代理,具体如下:我们(1)实现了一个变异自动编码器(VAE),(2)实现了一个深度隐藏马尔可夫模型(HMM),(3)实现了一个深度临界隐藏马尔可夫模型(CHMM)。对于 CHMM,我们实施了两个版本,一个是预期自由能最小化版本 CHMM[EFE],另一个是奖励最大化版本 CHMM[reward]。然后,我们实验了三种不同的行动选择策略:ε-贪婪算法以及软最大值和最佳行动选择。根据我们的实验,能够解决 dSprites 环境的模型都是奖励最大化的模型。进一步观察发现,期望自由能最小化的 CHMM 几乎总是选择相同的行动,这使它无法解决 dSprites 环境。与此相反,奖励最大化的 CHMM 始终选择所有行动,从而成功地解决了任务。这两种 CHMM 的唯一区别在于认识值,其目的是使转换网络和编码网络的输出尽可能接近。因此,最小化预期自由能的 CHMM 会反复选择单一行动,并在选择该行动时成为预测未来的专家。这就有效地减小了过渡网络和编码器网络输出之间的 KL 分歧。此外,在选择 "向下 "动作时,平均奖励为零,而对于所有其他动作,预期奖励将为负值。因此,如果 CHMM 必须坚持采取单一行动以保持较小的 KL 分歧,那么向下行动的回报最高。我们还在模拟中表明,深度主动推理中使用的认识值可能会出现退化行为,在某些情况下会有效地损失而不是获得信息。由于最小化 EFE 的代理无法探索其环境,因此深度主动推理中认识值的适当表述仍是一个悬而未决的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Computation
Neural Computation 工程技术-计算机:人工智能
CiteScore
6.30
自引率
3.40%
发文量
83
审稿时长
3.0 months
期刊介绍: Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信