{"title":"Measurement-Informed Safe Reinforcement Learning for Quantum Battery Charging via Harmonic-Syndrome Diagnostics and BMS Constraints","authors":"Sangkeum Lee;Beomdo Park;Junseong Park;Hyeonseok Jang;Hoon Jeong;Taewook Heo","doi":"10.1109/TQE.2026.3670136","DOIUrl":null,"url":null,"abstract":"Quantum batteries promise ultrafast energy storage but are highly sensitive to noise, drift, and hardware constraints, making safe high-performance charging a central challenge for noisy intermediate-scale quantum devices. We propose a measurement-informed safe control framework that couples harmonic-spectrum-based syndrome diagnostics—<inline-formula><tex-math>$H_{2}/H_{1}$</tex-math></inline-formula>, <inline-formula><tex-math>$H_{3}/H_{1}$</tex-math></inline-formula>, and frequency drift—with a battery management system (BMS)-constrained curriculum reinforcement learning (RL) policy. Spectral features are compressed into a three-level syndrome code (<inline-formula><tex-math>$s\\in \\lbrace 0,1,2\\rbrace$</tex-math></inline-formula>) that serves as a real-time hardware risk proxy for the controller. Our digital-twin simulator incorporates <inline-formula><tex-math>$T_{1}/T_\\phi$</tex-math></inline-formula> relaxation, crosstalk, collective effects, and terminal-voltage dynamics, while safety risks are explicitly encoded as BMS-related penalties (state-of-health, voltage limits, and high-risk operation ratio) in the RL reward. Across staged curricula of increasing system complexity, the learned policy empirically traces a strictly improved Pareto frontier between final ergotropy and high-risk ratio compared to baseline and threshold-grid control strategies, with gains confirmed by multiseed statistical confidence intervals. To support near-term deployment, we position the current work as a digital-twin stage and outline a concrete simulation-to-real protocol: fix receiver-operating-characteristic-calibrated thresholds, retune <inline-formula><tex-math>$(\\tau ^{w},\\tau ^{h})$</tex-math></inline-formula> on a small hardware calibration split, and validate a one-step voltage shield. We further demonstrate the framework on a benchtop transmon setup with <inline-formula><tex-math>$N=1$</tex-math></inline-formula>–2, reporting shield trigger/violation rates, sim-to-real drift of spectral features Kullback–Leibler divergence/earth mover's distance (KL)/(EMD), and an end-to-end latency within 20 <inline-formula><tex-math>$\\mathrm{\\mu }$</tex-math></inline-formula> <inline-formula><tex-math>$\\mathrm{s}$</tex-math></inline-formula>, indicating that harmonic-syndrome-informed safe RL is a viable route toward practical quantum battery charging control.","PeriodicalId":100644,"journal":{"name":"IEEE Transactions on Quantum Engineering","volume":"7 ","pages":"1-15"},"PeriodicalIF":4.6000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11419848","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Quantum Engineering","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11419848/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Quantum batteries promise ultrafast energy storage but are highly sensitive to noise, drift, and hardware constraints, making safe high-performance charging a central challenge for noisy intermediate-scale quantum devices. We propose a measurement-informed safe control framework that couples harmonic-spectrum-based syndrome diagnostics—$H_{2}/H_{1}$, $H_{3}/H_{1}$, and frequency drift—with a battery management system (BMS)-constrained curriculum reinforcement learning (RL) policy. Spectral features are compressed into a three-level syndrome code ($s\in \lbrace 0,1,2\rbrace$) that serves as a real-time hardware risk proxy for the controller. Our digital-twin simulator incorporates $T_{1}/T_\phi$ relaxation, crosstalk, collective effects, and terminal-voltage dynamics, while safety risks are explicitly encoded as BMS-related penalties (state-of-health, voltage limits, and high-risk operation ratio) in the RL reward. Across staged curricula of increasing system complexity, the learned policy empirically traces a strictly improved Pareto frontier between final ergotropy and high-risk ratio compared to baseline and threshold-grid control strategies, with gains confirmed by multiseed statistical confidence intervals. To support near-term deployment, we position the current work as a digital-twin stage and outline a concrete simulation-to-real protocol: fix receiver-operating-characteristic-calibrated thresholds, retune $(\tau ^{w},\tau ^{h})$ on a small hardware calibration split, and validate a one-step voltage shield. We further demonstrate the framework on a benchtop transmon setup with $N=1$–2, reporting shield trigger/violation rates, sim-to-real drift of spectral features Kullback–Leibler divergence/earth mover's distance (KL)/(EMD), and an end-to-end latency within 20 $\mathrm{\mu }$$\mathrm{s}$, indicating that harmonic-syndrome-informed safe RL is a viable route toward practical quantum battery charging control.