Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning

Proceedings of the 2nd International Conference on Vision, Image and Signal Processing Pub Date : 2018-08-27 DOI:10.1145/3271553.3271609

Wei Li, W. Meleis

{"title":"Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning","authors":"Wei Li, W. Meleis","doi":"10.1145/3271553.3271609","DOIUrl":null,"url":null,"abstract":"A major challenge in reinforcement learning (RL) is use of a tabular representation to represent learned policies with a large number of states or state-action pairs. Function approximation is a promising tool to overcome this deficiency. This approach uses parameterized functions instead of a table to represent learned knowledge and enables generalization. However, existing schemes cannot solve realistic RL problems, with their rapidly increasing demands for approximating accuracy and efficiency. In this paper, we extend the architecture of Sparse Distributed Memories (SDMs) and propose a novel on-line methodology, similarity-aware Kanerva coding (SAK), that closely represents the learned knowledge for very large-scale problems with significantly fewer parameterized components. SAK directly measures the state variables' real distances in all dimensions and reformulates a new state similarity metric with an improved definition of state closeness. As a result, our scheme accurately distributes and generalizes knowledge among related states. We further enhance SAK's efficiency by allowing a limited number of prototype states that have certain similarities to be activated for value approximation so that the risk of over-generalization is hindered. In addition, SAK eliminates size tuning and prototype reallocation for the prototype set, resulting in not only broadened scalability but also significant savings in the amount of necessary prototypes and computational overhead needed for RL. Our extensive experimental results show that SAK achieves more than 48% improvements over existing schemes in learning quality, and reveal that SAK is able to consistently learn good policies for RL with small overhead and short training times, even given roughly tuned scheme parameters.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3271553.3271609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

A major challenge in reinforcement learning (RL) is use of a tabular representation to represent learned policies with a large number of states or state-action pairs. Function approximation is a promising tool to overcome this deficiency. This approach uses parameterized functions instead of a table to represent learned knowledge and enables generalization. However, existing schemes cannot solve realistic RL problems, with their rapidly increasing demands for approximating accuracy and efficiency. In this paper, we extend the architecture of Sparse Distributed Memories (SDMs) and propose a novel on-line methodology, similarity-aware Kanerva coding (SAK), that closely represents the learned knowledge for very large-scale problems with significantly fewer parameterized components. SAK directly measures the state variables' real distances in all dimensions and reformulates a new state similarity metric with an improved definition of state closeness. As a result, our scheme accurately distributes and generalizes knowledge among related states. We further enhance SAK's efficiency by allowing a limited number of prototype states that have certain similarities to be activated for value approximation so that the risk of over-generalization is hindered. In addition, SAK eliminates size tuning and prototype reallocation for the prototype set, resulting in not only broadened scalability but also significant savings in the amount of necessary prototypes and computational overhead needed for RL. Our extensive experimental results show that SAK achieves more than 48% improvements over existing schemes in learning quality, and reveal that SAK is able to consistently learn good policies for RL with small overhead and short training times, even given roughly tuned scheme parameters.

查看原文本刊更多论文

基于相似性感知的Kanerva编码在线强化学习

强化学习(RL)的一个主要挑战是使用表格表示来表示具有大量状态或状态-动作对的学习策略。函数近似是克服这一缺陷的一种很有前途的工具。这种方法使用参数化函数而不是表来表示学习到的知识并实现泛化。然而，现有的方案无法解决现实的强化学习问题，对逼近精度和效率的要求迅速提高。在本文中，我们扩展了稀疏分布记忆(SDMs)的体系结构，并提出了一种新的在线方法，即相似感知Kanerva编码(SAK)，它可以用更少的参数化组件紧密地表示非常大规模问题的学习知识。SAK直接测量状态变量在所有维度上的实际距离，并通过改进的状态接近度定义重新制定新的状态相似度度量。因此，我们的方案可以准确地在相关状态之间分配和推广知识。我们进一步提高了SAK的效率，允许有限数量的具有一定相似性的原型状态被激活以进行值近似，从而阻碍了过度泛化的风险。此外，SAK消除了原型集的大小调优和原型重新分配，不仅扩大了可伸缩性，而且大大节省了RL所需的必要原型数量和计算开销。我们广泛的实验结果表明，SAK在学习质量上比现有方案提高了48%以上，并且表明即使给出粗略调整的方案参数，SAK也能够以较小的开销和较短的训练时间持续地为RL学习良好的策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd International Conference on Vision, Image and Signal Processing

自引率

0.00%

发文量