Xiaopeng Wang , Anyang He , Zongbo Li , Zaibin Jiao , Na Lu
{"title":"Reinforcement learning based early classification framework for power transformer differential protection","authors":"Xiaopeng Wang , Anyang He , Zongbo Li , Zaibin Jiao , Na Lu","doi":"10.1016/j.eswa.2025.128632","DOIUrl":null,"url":null,"abstract":"<div><div>The balance between response speed and diagnosis accuracy forms a critical concern in transformer protection. However, prevailing AI-based transformer protection methods tend to adopt fixed data length to extract electrical quantity information, thus impeding prompt responsiveness to situations where discriminative fault features emerge in the early stages. This study formulates transformer protection as a Markov decision process and proposes an Early Classification Proximal Policy Optimization (ECPPO) framework to utilize reinforcement learning (RL) for data-length adaptive transformer protection with timely action and notable high accuracy. However, the limited generalization of RL algorithms poses a significant issue in the transformer protection scenario. While enhancing the feature extraction capability of a model is essential for improving its generalization ability, ECPPO constructs a two-stage training paradigm to augment the policy model accordingly. In the first stage, a multi-task deep learning framework trains a feature-extraction module with normalization layers employing fault label information and a signal reconstruction task to enrich the feature representation. In the second stage, the pre-trained feature-extraction module is transferred to the agent model with frozen weights, and PPO training is performed. Additionally, to improve the utilization efficiency of samples, a period-circle-shift data augmentation method is proposed, which enhances the generalization capability by cyclically reconstructing data in periodic sequences. To validate the proposed framework, a series of experiments were conducted using simulation data generated by PSCAD/EMTDC software as the training data and practical data generated by experimental transformer system as the testing data. The experimental results demonstrate a significantly enhanced testing accuracy of 99.19 %, coupled with an average response time of 12.10 ms, indicating that the ECPPO algorithm not only achieves superior accuracy but also effectively reduces the average response time. Furthermore, the results highlight its robust generalization capability when transitioning from simulation to experimental systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128632"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425022511","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The balance between response speed and diagnosis accuracy forms a critical concern in transformer protection. However, prevailing AI-based transformer protection methods tend to adopt fixed data length to extract electrical quantity information, thus impeding prompt responsiveness to situations where discriminative fault features emerge in the early stages. This study formulates transformer protection as a Markov decision process and proposes an Early Classification Proximal Policy Optimization (ECPPO) framework to utilize reinforcement learning (RL) for data-length adaptive transformer protection with timely action and notable high accuracy. However, the limited generalization of RL algorithms poses a significant issue in the transformer protection scenario. While enhancing the feature extraction capability of a model is essential for improving its generalization ability, ECPPO constructs a two-stage training paradigm to augment the policy model accordingly. In the first stage, a multi-task deep learning framework trains a feature-extraction module with normalization layers employing fault label information and a signal reconstruction task to enrich the feature representation. In the second stage, the pre-trained feature-extraction module is transferred to the agent model with frozen weights, and PPO training is performed. Additionally, to improve the utilization efficiency of samples, a period-circle-shift data augmentation method is proposed, which enhances the generalization capability by cyclically reconstructing data in periodic sequences. To validate the proposed framework, a series of experiments were conducted using simulation data generated by PSCAD/EMTDC software as the training data and practical data generated by experimental transformer system as the testing data. The experimental results demonstrate a significantly enhanced testing accuracy of 99.19 %, coupled with an average response time of 12.10 ms, indicating that the ECPPO algorithm not only achieves superior accuracy but also effectively reduces the average response time. Furthermore, the results highlight its robust generalization capability when transitioning from simulation to experimental systems.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.