Quantum-accessible reinforcement learning beyond strictly epochal environments.

IF 4.1 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Quantum Machine Intelligence Pub Date : 2021-01-01 Epub Date: 2021-08-02 DOI:10.1007/s42484-021-00049-7
A Hamann, V Dunjko, S Wölk
{"title":"Quantum-accessible reinforcement learning beyond strictly epochal environments.","authors":"A Hamann,&nbsp;V Dunjko,&nbsp;S Wölk","doi":"10.1007/s42484-021-00049-7","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work, we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups. In addition, we prove that an algorithm based on Grover iterations is optimal for oracle identification even if the oracle changes over time in a way that the \"rewarded space\" is monotonically increasing. This result constitutes one of the first generalizations of quantum-accessible reinforcement learning.</p>","PeriodicalId":29924,"journal":{"name":"Quantum Machine Intelligence","volume":"3 2","pages":"22"},"PeriodicalIF":4.1000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550166/pdf/","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantum Machine Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42484-021-00049-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/8/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 8

Abstract

In recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work, we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups. In addition, we prove that an algorithm based on Grover iterations is optimal for oracle identification even if the oracle changes over time in a way that the "rewarded space" is monotonically increasing. This result constitutes one of the first generalizations of quantum-accessible reinforcement learning.

Abstract Image

Abstract Image

Abstract Image

超越严格划时代环境的量子可访问强化学习。
近年来,量子增强机器学习已经成为量子算法的一个特别富有成效的应用,涵盖了监督学习、无监督学习和强化学习的各个方面。强化学习为如何应用量子理论提供了许多选择,从量子的角度来看,这可以说是最少被探索的。在这里,一个智能体探索一个环境,并试图找到一种行为来优化一些价值值。最初的一些方法研究了可以加速这种探索的环境,通过考虑经典环境的量子类似物,然后可以在叠加中查询。如果环境在时间上具有严格的周期性结构(即严格的偶然性),则这种环境可以有效地转换为量子信息中遇到的常规预言。然而,在一般环境中,我们得到的是泛化标准oracle任务的场景。在这项工作中,我们考虑一个这样的概括,其中环境不是严格的情景,它被映射到一个oracle识别设置与一个变化的oracle。我们分析了这种情况,并表明,标准的振幅放大技术,通过微小的修改,仍然可以应用于实现二次加速。此外,我们证明了基于Grover迭代的算法对于oracle识别是最优的,即使oracle随着时间的推移以“奖励空间”单调增加的方式变化。这一结果构成了量子可访问强化学习的第一个概括之一。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.60
自引率
4.20%
发文量
29
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信