Quantum-accessible reinforcement learning beyond strictly epochal environments.

IF 4.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Quantum Machine Intelligence Pub Date : 2021-01-01 Epub Date: 2021-08-02 DOI:10.1007/s42484-021-00049-7

A Hamann, V Dunjko, S Wölk

{"title":"Quantum-accessible reinforcement learning beyond strictly epochal environments.","authors":"A Hamann, V Dunjko, S Wölk","doi":"10.1007/s42484-021-00049-7","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work, we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups. In addition, we prove that an algorithm based on Grover iterations is optimal for oracle identification even if the oracle changes over time in a way that the \"rewarded space\" is monotonically increasing. This result constitutes one of the first generalizations of quantum-accessible reinforcement learning.</p>","PeriodicalId":29924,"journal":{"name":"Quantum Machine Intelligence","volume":"3 2","pages":"22"},"PeriodicalIF":4.4000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550166/pdf/","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantum Machine Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s42484-021-00049-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/8/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 8

Abstract

In recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work, we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups. In addition, we prove that an algorithm based on Grover iterations is optimal for oracle identification even if the oracle changes over time in a way that the "rewarded space" is monotonically increasing. This result constitutes one of the first generalizations of quantum-accessible reinforcement learning.

Abstract Image

查看原文本刊更多论文

超越严格划时代环境的量子可访问强化学习。

近年来，量子增强机器学习已经成为量子算法的一个特别富有成效的应用，涵盖了监督学习、无监督学习和强化学习的各个方面。强化学习为如何应用量子理论提供了许多选择，从量子的角度来看，这可以说是最少被探索的。在这里，一个智能体探索一个环境，并试图找到一种行为来优化一些价值值。最初的一些方法研究了可以加速这种探索的环境，通过考虑经典环境的量子类似物，然后可以在叠加中查询。如果环境在时间上具有严格的周期性结构(即严格的偶然性)，则这种环境可以有效地转换为量子信息中遇到的常规预言。然而，在一般环境中，我们得到的是泛化标准oracle任务的场景。在这项工作中，我们考虑一个这样的概括，其中环境不是严格的情景，它被映射到一个oracle识别设置与一个变化的oracle。我们分析了这种情况，并表明，标准的振幅放大技术，通过微小的修改，仍然可以应用于实现二次加速。此外，我们证明了基于Grover迭代的算法对于oracle识别是最优的，即使oracle随着时间的推移以“奖励空间”单调增加的方式变化。这一结果构成了量子可访问强化学习的第一个概括之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Quantum Machine Intelligence Multiple-

CiteScore

7.60

自引率

4.20%

发文量