Monte Carlo tree search-based deep reinforcement learning for flexible operation & maintenance optimization of a nuclear power plant

Journal of Safety and Sustainability Pub Date : 2024-03-01 DOI:10.1016/j.jsasus.2023.08.001

Zhaojun Hao , Francesco Di Maio , Enrico Zio

{"title":"Monte Carlo tree search-based deep reinforcement learning for flexible operation & maintenance optimization of a nuclear power plant","authors":"Zhaojun Hao , Francesco Di Maio , Enrico Zio","doi":"10.1016/j.jsasus.2023.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.</p></div>","PeriodicalId":100831,"journal":{"name":"Journal of Safety and Sustainability","volume":"1 1","pages":"Pages 4-13"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294992672300001X/pdfft?md5=ee7ea82ca8f59dc3b8f715901a3d3437&pid=1-s2.0-S294992672300001X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Safety and Sustainability","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294992672300001X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.

查看原文本刊更多论文

基于蒙特卡洛树搜索的深度强化学习，用于核电站的灵活运维优化

核电站（NPP）需要在保证高安全标准的同时，按照灵活的盈利生产计划运行。深度强化学习（DRL）是在复杂系统中寻找最有利可图的运行与维护（O&M）策略的有效方法。然而，仅以利润为导向的 DRL 忽略了与安全相关的问题。在本文中，我们提出了一种 DRL 方法来解决单目标连续决策问题（SOSDP）和多目标连续决策问题（MOSDP），以找到在可靠性和利润之间进行权衡的运行与维护（O&M）策略。蒙特卡洛树搜索（MCTS）解决了与训练 RL 代理搜索最优解有关的组合问题，并将其性能与传统采用的近似策略优化（PPO）&模仿学习（IL）进行了比较。通过一个案例进行了演示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Safety and Sustainability

自引率

0.00%

发文量