基于r-STDP的深度强化学习策略微调

Mahmoud Akl, Yulia Sandamirskaya, Deniz Ergene, Florian Walter, Alois Knoll
{"title":"基于r-STDP的深度强化学习策略微调","authors":"Mahmoud Akl, Yulia Sandamirskaya, Deniz Ergene, Florian Walter, Alois Knoll","doi":"10.1145/3546790.3546804","DOIUrl":null,"url":null,"abstract":"Using deep reinforcement learning policies that are trained in simulation on real robotic platforms requires fine-tuning due to discrepancies between simulated and real environments. Multiple methods like domain randomization and system identification have been suggested to overcome this problem. However, sim-to-real transfer remains an open problem in robotics and deep reinforcement learning. In this paper, we present a spiking neural network (SNN) alternative for dealing with the sim-to-real problem. In particular, we train SNNs with backpropagation using surrogate gradients and the (Deep Q-Network) DQN algorithm to solve two classical control reinforcement learning tasks. The performance of the trained DQNs degrades when evaluated on randomized versions of the environments used during training. To compensate for the drop in performance, we apply the biologically plausible reward-modulated spike timing dependent plasticity (r-STDP) learning rule. Our results show that r-STDP can be successfully utilized to restore the network’s ability to solve the task. Furthermore, since r-STDP can be directly implemented on neuromorphic hardware, we believe it provides a promising neuromorphic solution to the sim-to-real problem.","PeriodicalId":104528,"journal":{"name":"Proceedings of the International Conference on Neuromorphic Systems 2022","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fine-tuning Deep Reinforcement Learning Policies with r-STDP for Domain Adaptation\",\"authors\":\"Mahmoud Akl, Yulia Sandamirskaya, Deniz Ergene, Florian Walter, Alois Knoll\",\"doi\":\"10.1145/3546790.3546804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using deep reinforcement learning policies that are trained in simulation on real robotic platforms requires fine-tuning due to discrepancies between simulated and real environments. Multiple methods like domain randomization and system identification have been suggested to overcome this problem. However, sim-to-real transfer remains an open problem in robotics and deep reinforcement learning. In this paper, we present a spiking neural network (SNN) alternative for dealing with the sim-to-real problem. In particular, we train SNNs with backpropagation using surrogate gradients and the (Deep Q-Network) DQN algorithm to solve two classical control reinforcement learning tasks. The performance of the trained DQNs degrades when evaluated on randomized versions of the environments used during training. To compensate for the drop in performance, we apply the biologically plausible reward-modulated spike timing dependent plasticity (r-STDP) learning rule. Our results show that r-STDP can be successfully utilized to restore the network’s ability to solve the task. Furthermore, since r-STDP can be directly implemented on neuromorphic hardware, we believe it provides a promising neuromorphic solution to the sim-to-real problem.\",\"PeriodicalId\":104528,\"journal\":{\"name\":\"Proceedings of the International Conference on Neuromorphic Systems 2022\",\"volume\":\"149 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Neuromorphic Systems 2022\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3546790.3546804\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Neuromorphic Systems 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3546790.3546804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

由于模拟环境和真实环境之间的差异,使用在真实机器人平台上模拟训练的深度强化学习策略需要进行微调。为了克服这一问题,人们提出了域随机化和系统识别等多种方法。然而,模拟到真实的迁移在机器人和深度强化学习中仍然是一个开放的问题。在本文中,我们提出了一个尖峰神经网络(SNN)的替代方案来处理模拟到真实的问题。特别是,我们使用代理梯度和(Deep Q-Network) DQN算法进行反向传播训练snn,以解决两个经典的控制强化学习任务。训练后的dqn在训练期间使用的随机环境中进行评估时,其性能会下降。为了弥补性能的下降,我们应用生物学上合理的奖励调制峰值时间依赖可塑性(r-STDP)学习规则。我们的研究结果表明,r-STDP可以成功地用于恢复网络解决任务的能力。此外,由于r-STDP可以直接在神经形态硬件上实现,我们相信它为模拟到真实的问题提供了一个有前途的神经形态解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fine-tuning Deep Reinforcement Learning Policies with r-STDP for Domain Adaptation
Using deep reinforcement learning policies that are trained in simulation on real robotic platforms requires fine-tuning due to discrepancies between simulated and real environments. Multiple methods like domain randomization and system identification have been suggested to overcome this problem. However, sim-to-real transfer remains an open problem in robotics and deep reinforcement learning. In this paper, we present a spiking neural network (SNN) alternative for dealing with the sim-to-real problem. In particular, we train SNNs with backpropagation using surrogate gradients and the (Deep Q-Network) DQN algorithm to solve two classical control reinforcement learning tasks. The performance of the trained DQNs degrades when evaluated on randomized versions of the environments used during training. To compensate for the drop in performance, we apply the biologically plausible reward-modulated spike timing dependent plasticity (r-STDP) learning rule. Our results show that r-STDP can be successfully utilized to restore the network’s ability to solve the task. Furthermore, since r-STDP can be directly implemented on neuromorphic hardware, we believe it provides a promising neuromorphic solution to the sim-to-real problem.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信