Robust quantum control using reinforcement learning from demonstration

IF 8.3 1区物理与天体物理 Q1 PHYSICS, APPLIED

npj Quantum Information Pub Date : 2025-07-25 DOI:10.1038/s41534-025-01065-2

Shengyong Li, Yidian Fan, Xiang Li, Xinhui Ruan, Qianchuan Zhao, Zhihui Peng, Re-Bing Wu, Jing Zhang, Pengtao Song

{"title":"Robust quantum control using reinforcement learning from demonstration","authors":"Shengyong Li, Yidian Fan, Xiang Li, Xinhui Ruan, Qianchuan Zhao, Zhihui Peng, Re-Bing Wu, Jing Zhang, Pengtao Song","doi":"10.1038/s41534-025-01065-2","DOIUrl":null,"url":null,"abstract":"<p>Quantum control requires high-precision and robust control pulses to ensure optimal system performance. However, control sequences generated with a system model may suffer from model bias, leading to low fidelity. While model-free reinforcement learning (RL) methods have been developed to avoid such biases, training an RL agent from scratch can be time-consuming, often taking hours to gather enough samples for convergence. This challenge has hindered the broad application of RL techniques to larger and more complex quantum control issues, limiting their adaptability. In this work, we use Reinforcement Learning from Demonstration (RLfD) to leverage the control sequences generated with system models and further optimize them with RL to avoid model bias. By avoiding learning from scratch and starting with reasonable control pulse shapes, this approach can increase sample efficiency by reducing the number of samples, which can significantly reduce the training time. Thus, this method can effectively handle pulse shapes that are discretized into more than 1000 pieces without compromising final fidelity. We have simulated the preparation of several high-fidelity non-classical states using the RLfD method. We also find that the training process is more stable when using RLfD. In addition, this method is suitable for fast gate calibration using reinforcement learning.</p>","PeriodicalId":19212,"journal":{"name":"npj Quantum Information","volume":"11 1","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Quantum Information","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1038/s41534-025-01065-2","RegionNum":1,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Quantum control requires high-precision and robust control pulses to ensure optimal system performance. However, control sequences generated with a system model may suffer from model bias, leading to low fidelity. While model-free reinforcement learning (RL) methods have been developed to avoid such biases, training an RL agent from scratch can be time-consuming, often taking hours to gather enough samples for convergence. This challenge has hindered the broad application of RL techniques to larger and more complex quantum control issues, limiting their adaptability. In this work, we use Reinforcement Learning from Demonstration (RLfD) to leverage the control sequences generated with system models and further optimize them with RL to avoid model bias. By avoiding learning from scratch and starting with reasonable control pulse shapes, this approach can increase sample efficiency by reducing the number of samples, which can significantly reduce the training time. Thus, this method can effectively handle pulse shapes that are discretized into more than 1000 pieces without compromising final fidelity. We have simulated the preparation of several high-fidelity non-classical states using the RLfD method. We also find that the training process is more stable when using RLfD. In addition, this method is suitable for fast gate calibration using reinforcement learning.

Abstract Image

查看原文本刊更多论文

基于强化学习的鲁棒量子控制

量子控制需要高精度和鲁棒的控制脉冲，以确保最佳的系统性能。然而，由系统模型生成的控制序列可能会受到模型偏差的影响，从而导致低保真度。虽然已经开发了无模型强化学习（RL）方法来避免这种偏差，但从头开始训练RL代理可能很耗时，通常需要数小时才能收集足够的样本进行收敛。这一挑战阻碍了RL技术在更大、更复杂的量子控制问题上的广泛应用，限制了它们的适应性。在这项工作中，我们使用从演示中强化学习（RLfD）来利用由系统模型生成的控制序列，并使用RL进一步优化它们以避免模型偏差。通过避免从头开始学习，从合理的控制脉冲形状开始，该方法可以通过减少样本数量来提高样本效率，从而显著减少训练时间。因此，该方法可以有效地处理被离散成1000多个片段的脉冲形状，而不会影响最终的保真度。我们用RLfD方法模拟了几种高保真非经典态的制备。我们还发现，当使用RLfD时，训练过程更加稳定。此外，该方法适用于基于强化学习的快速门校正。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

npj Quantum Information Computer Science-Computer Science (miscellaneous)

CiteScore

13.70

自引率

3.90%

发文量

130

审稿时长

29 weeks

期刊介绍： The scope of npj Quantum Information spans across all relevant disciplines, fields, approaches and levels and so considers outstanding work ranging from fundamental research to applications and technologies.