{"title":"Monte Carlo tree search-based deep reinforcement learning for flexible operation & maintenance optimization of a nuclear power plant","authors":"Zhaojun Hao , Francesco Di Maio , Enrico Zio","doi":"10.1016/j.jsasus.2023.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.</p></div>","PeriodicalId":100831,"journal":{"name":"Journal of Safety and Sustainability","volume":"1 1","pages":"Pages 4-13"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294992672300001X/pdfft?md5=ee7ea82ca8f59dc3b8f715901a3d3437&pid=1-s2.0-S294992672300001X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Safety and Sustainability","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294992672300001X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.