Distributional Reinforcement Learning with Quantum Neural Networks

智能控制与自动化(英文) Pub Date : 2019-04-10 DOI:10.4236/ICA.2019.102004

Wei Hu, James Hu

{"title":"Distributional Reinforcement Learning with Quantum Neural Networks","authors":"Wei Hu, James Hu","doi":"10.4236/ICA.2019.102004","DOIUrl":null,"url":null,"abstract":"Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.","PeriodicalId":62904,"journal":{"name":"智能控制与自动化(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"智能控制与自动化(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/ICA.2019.102004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional Q learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum Q learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional Q learning can explore the environment more efficiently than the standard quantum Q learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.

查看原文本刊更多论文

量子神经网络的分布式强化学习

传统的强化学习（RL）使用回报，也称为累积随机奖励的期望值，来训练代理学习最优策略。然而，最近的研究表明，在不同的RL任务中，学习回报的分布比学习期望值具有明显的优势。从传统RL中使用收益预期到分布RL中收益分配的转变为RL的动态提供了新的见解。本文建立在我们最近研究RL的量子方法的基础上。我们的工作利用量子神经网络实现了分位数回归（QR）分布Q学习。该量子网络在具有不同数量分位数的网格世界环境中进行了评估，说明了其对算法学习的详细影响。它还与马尔可夫决策过程（MDP）链中的标准量子Q学习进行了比较，表明量子QR分布式Q学习比标准量子Q学更有效地探索环境。有效的勘探以及开发与勘探的平衡是RL面临的主要挑战。先前的工作表明，可以从分布的角度采取更具信息性的行动。我们的研究结果表明了其成功的另一个原因：分布式RL性能的提高可以部分归因于其高效探索环境的卓越能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

智能控制与自动化(英文)

自引率

0.00%

发文量

243