{"title":"基于蒙特卡洛树搜索的深度强化学习,用于核电站的灵活运维优化","authors":"Zhaojun Hao , Francesco Di Maio , Enrico Zio","doi":"10.1016/j.jsasus.2023.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.</p></div>","PeriodicalId":100831,"journal":{"name":"Journal of Safety and Sustainability","volume":"1 1","pages":"Pages 4-13"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294992672300001X/pdfft?md5=ee7ea82ca8f59dc3b8f715901a3d3437&pid=1-s2.0-S294992672300001X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Monte Carlo tree search-based deep reinforcement learning for flexible operation & maintenance optimization of a nuclear power plant\",\"authors\":\"Zhaojun Hao , Francesco Di Maio , Enrico Zio\",\"doi\":\"10.1016/j.jsasus.2023.08.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.</p></div>\",\"PeriodicalId\":100831,\"journal\":{\"name\":\"Journal of Safety and Sustainability\",\"volume\":\"1 1\",\"pages\":\"Pages 4-13\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S294992672300001X/pdfft?md5=ee7ea82ca8f59dc3b8f715901a3d3437&pid=1-s2.0-S294992672300001X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Safety and Sustainability\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S294992672300001X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Safety and Sustainability","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294992672300001X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Monte Carlo tree search-based deep reinforcement learning for flexible operation & maintenance optimization of a nuclear power plant
Nuclear power plants (NPPs) are required to operate on a flexible profitable production plan while guaranteeing high safety standards. Deep reinforcement learning (DRL) is an effective method to find the most profitable operation & maintenance (O&M) strategy to adopt in a complex system. However, profit-driven only DRL neglects safety-related issues. In this paper, we propose a DRL approach to solve single-objective sequential decision problems (SOSDPs) and multi-objective sequential decision problems (MOSDPs) to find O&M strategies that trade off reliability and profit. The combinatorial problem related with the training of the RL agent to search for the optimal solution is addressed by Monte Carlo tree search (MCTS), whose performance is compared with the traditionally adopted proximal policy optimization (PPO) & imitation learning (IL). A case study is considered for demonstration.