Reinforcement Learning-Based Control Sequence Optimization for Advanced Reactors

Khang H. N. Nguyen, Andy Rivas, G. Delipei, Jason Hou
{"title":"Reinforcement Learning-Based Control Sequence Optimization for Advanced Reactors","authors":"Khang H. N. Nguyen, Andy Rivas, G. Delipei, Jason Hou","doi":"10.3390/jne5030015","DOIUrl":null,"url":null,"abstract":"The last decade has seen the development and application of data-driven methods taking off in nuclear engineering research, aiming to improve the safety and reliability of nuclear power. This work focuses on developing a reinforcement learning-based control sequence optimization framework for advanced nuclear systems, which not only aims to enhance flexible operations, promoting the economics of advanced nuclear technology, but also prioritizing safety during normal operation. At its core, the framework allows the sequence of operational actions to be learned and optimized by an agent to facilitate smooth transitions between the modes of operations (i.e., load-following), while ensuring that all safety significant system parameters remain within their respective limits. To generate dynamic system responses, facilitate control strategy development, and demonstrate the effectiveness of the framework, a simulation environment of a pebble-bed high-temperature gas-cooled reactor was utilized. The soft actor-critic algorithm was adopted to train a reinforcement learning agent, which can generate control sequences to maneuver plant power output in the range between 100% and 50% of the nameplate power through sufficient training. It was shown in the performance validation that the agent successfully generated control actions that maintained electrical output within a tight tolerance of 0.5% from the demand while satisfying all safety constraints. During the mode transition, the agent can maintain the reactor outlet temperature within ±1.5 °C and steam pressure within 0.1 MPa of their setpoints, respectively, by dynamically adjusting control rod positions, control valve openings, and pump speeds. The results demonstrate the effectiveness of the optimization framework and the feasibility of reinforcement learning in designing control strategies for advanced reactor systems.","PeriodicalId":512967,"journal":{"name":"Journal of Nuclear Engineering","volume":"40 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nuclear Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jne5030015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The last decade has seen the development and application of data-driven methods taking off in nuclear engineering research, aiming to improve the safety and reliability of nuclear power. This work focuses on developing a reinforcement learning-based control sequence optimization framework for advanced nuclear systems, which not only aims to enhance flexible operations, promoting the economics of advanced nuclear technology, but also prioritizing safety during normal operation. At its core, the framework allows the sequence of operational actions to be learned and optimized by an agent to facilitate smooth transitions between the modes of operations (i.e., load-following), while ensuring that all safety significant system parameters remain within their respective limits. To generate dynamic system responses, facilitate control strategy development, and demonstrate the effectiveness of the framework, a simulation environment of a pebble-bed high-temperature gas-cooled reactor was utilized. The soft actor-critic algorithm was adopted to train a reinforcement learning agent, which can generate control sequences to maneuver plant power output in the range between 100% and 50% of the nameplate power through sufficient training. It was shown in the performance validation that the agent successfully generated control actions that maintained electrical output within a tight tolerance of 0.5% from the demand while satisfying all safety constraints. During the mode transition, the agent can maintain the reactor outlet temperature within ±1.5 °C and steam pressure within 0.1 MPa of their setpoints, respectively, by dynamically adjusting control rod positions, control valve openings, and pump speeds. The results demonstrate the effectiveness of the optimization framework and the feasibility of reinforcement learning in designing control strategies for advanced reactor systems.
基于强化学习的先进反应堆控制顺序优化
近十年来,数据驱动方法的开发和应用在核工程研究中蓬勃发展,旨在提高核电的安全性和可靠性。这项工作的重点是为先进的核系统开发基于强化学习的控制顺序优化框架,其目的不仅在于提高灵活的操作,促进先进核技术的经济性,还在于优先考虑正常运行期间的安全性。该框架的核心是允许代理学习和优化操作行动顺序,以促进操作模式(即负载跟随)之间的平稳过渡,同时确保所有安全重要的系统参数保持在各自的限制范围内。为了生成动态系统响应,促进控制策略的制定,并展示该框架的有效性,我们使用了鹅卵石床高温气冷反应堆的模拟环境。采用软行为批判算法来训练强化学习代理,通过充分训练,该代理可生成控制序列,将电厂功率输出控制在铭牌功率的 100%至 50%之间。性能验证结果表明,该代理成功地生成了控制动作,在满足所有安全约束条件的前提下,将电力输出维持在与需求量 0.5% 的严格容差范围内。在模式转换期间,代理可通过动态调整控制棒位置、控制阀开度和泵速,将反应堆出口温度和蒸汽压力分别维持在设定值的±1.5 °C和0.1 MPa范围内。结果证明了优化框架的有效性以及强化学习在设计先进反应堆系统控制策略方面的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信