Performance comparison of the quantum and classical deep Q-learning approaches in dynamic environments control

IF 5.6 2区物理与天体物理 Q1 OPTICS

EPJ Quantum Technology Pub Date : 2025-06-16 DOI:10.1140/epjqt/s40507-025-00381-y

Aramchehr Zare, Mehrdad Boroushaki

{"title":"Performance comparison of the quantum and classical deep Q-learning approaches in dynamic environments control","authors":"Aramchehr Zare, Mehrdad Boroushaki","doi":"10.1140/epjqt/s40507-025-00381-y","DOIUrl":null,"url":null,"abstract":"<div><p>There is a lack of adequate studies on dynamic environments control for Quantum Reinforcement Learning (QRL) algorithms, representing a significant gap in this field. This study contributes to bridging this gap by demonstrating the potential of quantum RL algorithms to effectively handle dynamic environments. In this research, the performance and robustness of Quantum Deep Q-learning Networks (DQN) were examined in two dynamic environments, Cart Pole and Lunar Lander, by using three distinct quantum Ansatz layers: RealAmplitudes, EfficientSU2, and TwoLocal. The quantum DQNs were compared with classical DQN algorithms in terms of convergence speed, loss minimization, and Q-value behavior. It was observed that the RealAmplitudes Ansatz outperformed the other quantum circuits, demonstrating faster convergence and superior performance in minimizing the loss function. To assess robustness, the pole length was increased in the Cart Pole environment, and a wind function was added to the Lunar Lander environment after the 50th episode. All three quantum Ansatz layers were found to maintain robust performance under disturbed conditions, with consistent reward values, loss minimization, and stable Q-value distributions. Although the proposed QRL demonstrates competitive results overall, classical RL can surpass them in convergence speed under specific conditions.</p></div>","PeriodicalId":547,"journal":{"name":"EPJ Quantum Technology","volume":"12 1","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://epjquantumtechnology.springeropen.com/counter/pdf/10.1140/epjqt/s40507-025-00381-y","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EPJ Quantum Technology","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1140/epjqt/s40507-025-00381-y","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

Abstract

There is a lack of adequate studies on dynamic environments control for Quantum Reinforcement Learning (QRL) algorithms, representing a significant gap in this field. This study contributes to bridging this gap by demonstrating the potential of quantum RL algorithms to effectively handle dynamic environments. In this research, the performance and robustness of Quantum Deep Q-learning Networks (DQN) were examined in two dynamic environments, Cart Pole and Lunar Lander, by using three distinct quantum Ansatz layers: RealAmplitudes, EfficientSU2, and TwoLocal. The quantum DQNs were compared with classical DQN algorithms in terms of convergence speed, loss minimization, and Q-value behavior. It was observed that the RealAmplitudes Ansatz outperformed the other quantum circuits, demonstrating faster convergence and superior performance in minimizing the loss function. To assess robustness, the pole length was increased in the Cart Pole environment, and a wind function was added to the Lunar Lander environment after the 50th episode. All three quantum Ansatz layers were found to maintain robust performance under disturbed conditions, with consistent reward values, loss minimization, and stable Q-value distributions. Although the proposed QRL demonstrates competitive results overall, classical RL can surpass them in convergence speed under specific conditions.

查看原文本刊更多论文

量子与经典深度q学习方法在动态环境控制中的性能比较

在量子强化学习（QRL）算法的动态环境控制方面缺乏足够的研究，这是该领域的一个重大空白。本研究通过展示量子强化学习算法有效处理动态环境的潜力，有助于弥合这一差距。在这项研究中，通过使用三个不同的量子Ansatz层：RealAmplitudes、EfficientSU2和twollocal，在两个动态环境（Cart Pole和Lunar Lander）中测试了量子深度q学习网络（DQN）的性能和鲁棒性。量子DQN在收敛速度、损失最小化和q值行为方面与经典DQN算法进行了比较。结果表明，RealAmplitudes Ansatz优于其他量子电路，在最小化损失函数方面表现出更快的收敛速度和优越的性能。为了评估稳健性，在Cart pole环境中增加了极点长度，并在第50集后在月球着陆器环境中添加了风函数。发现所有三个量子Ansatz层在扰动条件下保持稳健的性能，具有一致的奖励值，损失最小化和稳定的q值分布。尽管本文提出的QRL在总体上表现出竞争性结果，但在特定条件下，经典RL在收敛速度上可以超越它们。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EPJ Quantum Technology Physics and Astronomy-Atomic and Molecular Physics, and Optics

CiteScore

7.70

自引率

7.50%

发文量

审稿时长

71 days

期刊介绍： Driven by advances in technology and experimental capability, the last decade has seen the emergence of quantum technology: a new praxis for controlling the quantum world. It is now possible to engineer complex, multi-component systems that merge the once distinct fields of quantum optics and condensed matter physics. EPJ Quantum Technology covers theoretical and experimental advances in subjects including but not limited to the following: Quantum measurement, metrology and lithography Quantum complex systems, networks and cellular automata Quantum electromechanical systems Quantum optomechanical systems Quantum machines, engineering and nanorobotics Quantum control theory Quantum information, communication and computation Quantum thermodynamics Quantum metamaterials The effect of Casimir forces on micro- and nano-electromechanical systems Quantum biology Quantum sensing Hybrid quantum systems Quantum simulations.