A soft actor-critic reinforcement learning-based method for remaining useful life prediction

IF 9.4 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Reliability Engineering & System Safety Pub Date : 2025-04-08 DOI:10.1016/j.ress.2025.111121

Shousheng Ding , Lei Meng , Jie Shang , Chen Jiang , Haobo Qiu , Liang Gao

{"title":"A soft actor-critic reinforcement learning-based method for remaining useful life prediction","authors":"Shousheng Ding , Lei Meng , Jie Shang , Chen Jiang , Haobo Qiu , Liang Gao","doi":"10.1016/j.ress.2025.111121","DOIUrl":null,"url":null,"abstract":"<div><div>Remaining useful life (RUL) prediction techniques play a crucial role in manufacturing equipment condition management and maintenance planning. Currently, data-driven deep learning methods have made significant advancements in this field. However, traditional approaches have not adequately considered the temporal correlations in both sensor data and RUL prediction values during the degradation process of equipment. The existing reinforcement learning (RL) methods face challenges such as lacking of sufficient lifespan variation information in the state variables, ignorance of dynamic changes in prediction error in the reward function design, and adoption of fixed interaction termination conditions that can't effectively promote the agent's learning of device degradation information. Therefore, this paper proposes a RL model based on the soft actor-critic (SAC) algorithm. Firstly, an autoencoder is employed to extract key features from the data collected by sensors. Subsequently, these key features, along with multi-dimensional lifespan features containing information from multiple historical time steps, are utilized to construct the state variables in RL. Next, a reward function is formulated taking into account error gradients. Finally, a progressive early stopping method is proposed to train the model. Extensive experiments are conducted on the CMAPSS dataset and XJTU-SY bearing dataset, and the proposed method demonstrates higher prediction accuracy compared to mainstream approaches.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"261 ","pages":"Article 111121"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832025003229","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

Remaining useful life (RUL) prediction techniques play a crucial role in manufacturing equipment condition management and maintenance planning. Currently, data-driven deep learning methods have made significant advancements in this field. However, traditional approaches have not adequately considered the temporal correlations in both sensor data and RUL prediction values during the degradation process of equipment. The existing reinforcement learning (RL) methods face challenges such as lacking of sufficient lifespan variation information in the state variables, ignorance of dynamic changes in prediction error in the reward function design, and adoption of fixed interaction termination conditions that can't effectively promote the agent's learning of device degradation information. Therefore, this paper proposes a RL model based on the soft actor-critic (SAC) algorithm. Firstly, an autoencoder is employed to extract key features from the data collected by sensors. Subsequently, these key features, along with multi-dimensional lifespan features containing information from multiple historical time steps, are utilized to construct the state variables in RL. Next, a reward function is formulated taking into account error gradients. Finally, a progressive early stopping method is proposed to train the model. Extensive experiments are conducted on the CMAPSS dataset and XJTU-SY bearing dataset, and the proposed method demonstrates higher prediction accuracy compared to mainstream approaches.

查看原文本刊更多论文

基于软行为者-评论家强化学习的剩余使用寿命预测方法

剩余使用寿命（RUL）预测技术在制造设备状态管理和维修计划中起着至关重要的作用。目前，数据驱动的深度学习方法在这一领域取得了重大进展。然而，传统方法没有充分考虑设备退化过程中传感器数据和RUL预测值的时间相关性。现有的强化学习（RL）方法面临着状态变量中缺乏足够的寿命变化信息、奖励函数设计中忽略预测误差的动态变化、采用固定的交互终止条件不能有效促进agent对设备退化信息的学习等挑战。为此，本文提出了一种基于软行为者评价（SAC）算法的强化学习模型。首先，利用自编码器从传感器采集的数据中提取关键特征；随后，利用这些关键特征以及包含多个历史时间步长信息的多维寿命特征来构建强化学习中的状态变量。接下来，考虑误差梯度，制定奖励函数。最后，提出了一种渐进的早期停止方法来训练模型。在CMAPSS数据集和XJTU-SY方位数据集上进行了大量实验，与主流方法相比，该方法具有更高的预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Reliability Engineering & System Safety 管理科学-工程：工业

CiteScore

15.20

自引率

39.50%

发文量

621

审稿时长

67 days

期刊介绍： Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.