Navigating the unknown: Leveraging self-information and diversity in partially observable environments

IF 2.5 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY
Devdhar Patel, Hava T. Siegelmann
{"title":"Navigating the unknown: Leveraging self-information and diversity in partially observable environments","authors":"Devdhar Patel,&nbsp;Hava T. Siegelmann","doi":"10.1016/j.bbrc.2024.150923","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.</div></div>","PeriodicalId":8779,"journal":{"name":"Biochemical and biophysical research communications","volume":"741 ","pages":"Article 150923"},"PeriodicalIF":2.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical and biophysical research communications","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0006291X24014591","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.
在未知环境中航行在部分可观测环境中利用自身信息和多样性。
强化学习算法在部分可观察环境中的学习往往十分困难,因为在这种环境中,环境的不同状态可能看起来完全相同。然而,并非所有的部分可观察环境都会给学习带来同样的困难。这项研究引入了 "不和谐距离 "的概念,这是一种可以估算在此类环境中学习难度的指标。我们证明,在部分可观察环境中,内部振荡或对先前行动的记忆等自我信息可以增加不和谐距离,使学习变得更容易。此外,感官闭塞可能会在学习完成后发生,导致缺乏足够的信息和灾难性的失败。为解决这一问题,我们提出了一种空间分层架构(SLA),其灵感来自大脑,可针对同一任务并行训练多个策略。SLA 可以改变每个时间步处理的外部信息量,提供一种自适应方法来处理环境状态空间中不断变化的信息。我们对 SLA 方法的有效性进行了评估,结果表明,在部分可观测的连续山地车环境中,该方法具有可学习性和鲁棒性,能够抵御现实中的噪声和闭塞感官输入。我们假设,像 SLA 这样的多策略方法可以解释大脑中复杂的多巴胺动态,而这种动态无法用最先进的标量时差误差来解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biochemical and biophysical research communications
Biochemical and biophysical research communications 生物-生化与分子生物学
CiteScore
6.10
自引率
0.00%
发文量
1400
审稿时长
14 days
期刊介绍: Biochemical and Biophysical Research Communications is the premier international journal devoted to the very rapid dissemination of timely and significant experimental results in diverse fields of biological research. The development of the "Breakthroughs and Views" section brings the minireview format to the journal, and issues often contain collections of special interest manuscripts. BBRC is published weekly (52 issues/year).Research Areas now include: Biochemistry; biophysics; cell biology; developmental biology; immunology ; molecular biology; neurobiology; plant biology and proteomics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信