Influence-aware memory architectures for deep reinforcement learning in POMDPs.

Neural Computing and Applications Pub Date : 2025-01-01 Epub Date: 2022-09-04 DOI:10.1007/s00521-022-07691-7

Miguel Suau, Jinke He, Elena Congeduti, Rolf A N Starre, Aleksander Czechowski, Frans A Oliehoek

{"title":"Influence-aware memory architectures for deep reinforcement learning in POMDPs.","authors":"Miguel Suau, Jinke He, Elena Congeduti, Rolf A N Starre, Aleksander Czechowski, Frans A Oliehoek","doi":"10.1007/s00521-022-07691-7","DOIUrl":null,"url":null,"abstract":"Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN's internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"63 1","pages":"13145-13161"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204899/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-022-07691-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN's internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.

Abstract Image

查看原文本刊更多论文

pomdp中深度强化学习的影响感知记忆架构。

由于感知的限制，智能体可能对环境的信息太少，无法做出最佳行动。在这种情况下，跟踪动作观察历史以发现隐藏的状态信息是很重要的。最近的深度强化学习方法使用递归神经网络（RNN）来记忆过去的观察结果。然而，这些模型的训练成本很高，并且有收敛困难，特别是在处理高维数据时。在本文中，我们提出了影响感知记忆，这是一种理论上启发的记忆架构，通过将循环层的输入限制为影响隐藏状态信息的变量来缓解训练困难。此外，与标准RNN相反，在标准RNN中，用于估计Q值的每条信息都不可避免地反馈到网络中用于下一次预测，我们的模型允许信息流动，而不必存储在RNN的内部存储器中。结果表明，通过让循环层专注于一小部分观察变量，同时使用前馈神经网络处理其余信息，我们可以在训练速度和策略性能方面优于标准循环架构。这种方法还减少了运行时间，并获得比堆栈多个观测值以消除部分可观察性的方法更好的分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Computing and Applications

自引率

0.00%

发文量