Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning

Wei Li, W. Meleis
{"title":"Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning","authors":"Wei Li, W. Meleis","doi":"10.1145/3271553.3271609","DOIUrl":null,"url":null,"abstract":"A major challenge in reinforcement learning (RL) is use of a tabular representation to represent learned policies with a large number of states or state-action pairs. Function approximation is a promising tool to overcome this deficiency. This approach uses parameterized functions instead of a table to represent learned knowledge and enables generalization. However, existing schemes cannot solve realistic RL problems, with their rapidly increasing demands for approximating accuracy and efficiency. In this paper, we extend the architecture of Sparse Distributed Memories (SDMs) and propose a novel on-line methodology, similarity-aware Kanerva coding (SAK), that closely represents the learned knowledge for very large-scale problems with significantly fewer parameterized components. SAK directly measures the state variables' real distances in all dimensions and reformulates a new state similarity metric with an improved definition of state closeness. As a result, our scheme accurately distributes and generalizes knowledge among related states. We further enhance SAK's efficiency by allowing a limited number of prototype states that have certain similarities to be activated for value approximation so that the risk of over-generalization is hindered. In addition, SAK eliminates size tuning and prototype reallocation for the prototype set, resulting in not only broadened scalability but also significant savings in the amount of necessary prototypes and computational overhead needed for RL. Our extensive experimental results show that SAK achieves more than 48% improvements over existing schemes in learning quality, and reveal that SAK is able to consistently learn good policies for RL with small overhead and short training times, even given roughly tuned scheme parameters.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3271553.3271609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

A major challenge in reinforcement learning (RL) is use of a tabular representation to represent learned policies with a large number of states or state-action pairs. Function approximation is a promising tool to overcome this deficiency. This approach uses parameterized functions instead of a table to represent learned knowledge and enables generalization. However, existing schemes cannot solve realistic RL problems, with their rapidly increasing demands for approximating accuracy and efficiency. In this paper, we extend the architecture of Sparse Distributed Memories (SDMs) and propose a novel on-line methodology, similarity-aware Kanerva coding (SAK), that closely represents the learned knowledge for very large-scale problems with significantly fewer parameterized components. SAK directly measures the state variables' real distances in all dimensions and reformulates a new state similarity metric with an improved definition of state closeness. As a result, our scheme accurately distributes and generalizes knowledge among related states. We further enhance SAK's efficiency by allowing a limited number of prototype states that have certain similarities to be activated for value approximation so that the risk of over-generalization is hindered. In addition, SAK eliminates size tuning and prototype reallocation for the prototype set, resulting in not only broadened scalability but also significant savings in the amount of necessary prototypes and computational overhead needed for RL. Our extensive experimental results show that SAK achieves more than 48% improvements over existing schemes in learning quality, and reveal that SAK is able to consistently learn good policies for RL with small overhead and short training times, even given roughly tuned scheme parameters.
基于相似性感知的Kanerva编码在线强化学习
强化学习(RL)的一个主要挑战是使用表格表示来表示具有大量状态或状态-动作对的学习策略。函数近似是克服这一缺陷的一种很有前途的工具。这种方法使用参数化函数而不是表来表示学习到的知识并实现泛化。然而,现有的方案无法解决现实的强化学习问题,对逼近精度和效率的要求迅速提高。在本文中,我们扩展了稀疏分布记忆(SDMs)的体系结构,并提出了一种新的在线方法,即相似感知Kanerva编码(SAK),它可以用更少的参数化组件紧密地表示非常大规模问题的学习知识。SAK直接测量状态变量在所有维度上的实际距离,并通过改进的状态接近度定义重新制定新的状态相似度度量。因此,我们的方案可以准确地在相关状态之间分配和推广知识。我们进一步提高了SAK的效率,允许有限数量的具有一定相似性的原型状态被激活以进行值近似,从而阻碍了过度泛化的风险。此外,SAK消除了原型集的大小调优和原型重新分配,不仅扩大了可伸缩性,而且大大节省了RL所需的必要原型数量和计算开销。我们广泛的实验结果表明,SAK在学习质量上比现有方案提高了48%以上,并且表明即使给出粗略调整的方案参数,SAK也能够以较小的开销和较短的训练时间持续地为RL学习良好的策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信