Reinforcement learning based early classification framework for power transformer differential protection

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-06-15 DOI:10.1016/j.eswa.2025.128632

Xiaopeng Wang , Anyang He , Zongbo Li , Zaibin Jiao , Na Lu

{"title":"Reinforcement learning based early classification framework for power transformer differential protection","authors":"Xiaopeng Wang , Anyang He , Zongbo Li , Zaibin Jiao , Na Lu","doi":"10.1016/j.eswa.2025.128632","DOIUrl":null,"url":null,"abstract":"<div><div>The balance between response speed and diagnosis accuracy forms a critical concern in transformer protection. However, prevailing AI-based transformer protection methods tend to adopt fixed data length to extract electrical quantity information, thus impeding prompt responsiveness to situations where discriminative fault features emerge in the early stages. This study formulates transformer protection as a Markov decision process and proposes an Early Classification Proximal Policy Optimization (ECPPO) framework to utilize reinforcement learning (RL) for data-length adaptive transformer protection with timely action and notable high accuracy. However, the limited generalization of RL algorithms poses a significant issue in the transformer protection scenario. While enhancing the feature extraction capability of a model is essential for improving its generalization ability, ECPPO constructs a two-stage training paradigm to augment the policy model accordingly. In the first stage, a multi-task deep learning framework trains a feature-extraction module with normalization layers employing fault label information and a signal reconstruction task to enrich the feature representation. In the second stage, the pre-trained feature-extraction module is transferred to the agent model with frozen weights, and PPO training is performed. Additionally, to improve the utilization efficiency of samples, a period-circle-shift data augmentation method is proposed, which enhances the generalization capability by cyclically reconstructing data in periodic sequences. To validate the proposed framework, a series of experiments were conducted using simulation data generated by PSCAD/EMTDC software as the training data and practical data generated by experimental transformer system as the testing data. The experimental results demonstrate a significantly enhanced testing accuracy of 99.19 %, coupled with an average response time of 12.10 ms, indicating that the ECPPO algorithm not only achieves superior accuracy but also effectively reduces the average response time. Furthermore, the results highlight its robust generalization capability when transitioning from simulation to experimental systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"292 ","pages":"Article 128632"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425022511","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The balance between response speed and diagnosis accuracy forms a critical concern in transformer protection. However, prevailing AI-based transformer protection methods tend to adopt fixed data length to extract electrical quantity information, thus impeding prompt responsiveness to situations where discriminative fault features emerge in the early stages. This study formulates transformer protection as a Markov decision process and proposes an Early Classification Proximal Policy Optimization (ECPPO) framework to utilize reinforcement learning (RL) for data-length adaptive transformer protection with timely action and notable high accuracy. However, the limited generalization of RL algorithms poses a significant issue in the transformer protection scenario. While enhancing the feature extraction capability of a model is essential for improving its generalization ability, ECPPO constructs a two-stage training paradigm to augment the policy model accordingly. In the first stage, a multi-task deep learning framework trains a feature-extraction module with normalization layers employing fault label information and a signal reconstruction task to enrich the feature representation. In the second stage, the pre-trained feature-extraction module is transferred to the agent model with frozen weights, and PPO training is performed. Additionally, to improve the utilization efficiency of samples, a period-circle-shift data augmentation method is proposed, which enhances the generalization capability by cyclically reconstructing data in periodic sequences. To validate the proposed framework, a series of experiments were conducted using simulation data generated by PSCAD/EMTDC software as the training data and practical data generated by experimental transformer system as the testing data. The experimental results demonstrate a significantly enhanced testing accuracy of 99.19 %, coupled with an average response time of 12.10 ms, indicating that the ECPPO algorithm not only achieves superior accuracy but also effectively reduces the average response time. Furthermore, the results highlight its robust generalization capability when transitioning from simulation to experimental systems.

查看原文本刊更多论文

基于强化学习的电力变压器差动保护早期分类框架

在变压器保护中，响应速度和诊断准确性之间的平衡是一个关键问题。然而，现有的基于人工智能的变压器保护方法往往采用固定的数据长度来提取电量信息，从而阻碍了对早期出现判别性故障特征时的快速响应。本研究将变压器保护描述为一个马尔可夫决策过程，并提出了一个早期分类近端策略优化（ECPPO）框架，利用强化学习（RL）进行数据长度自适应变压器保护，动作及时，精度高。然而，在变压器保护场景中，RL算法的有限泛化提出了一个重大问题。增强模型的特征提取能力是提高模型泛化能力的关键，ECPPO构建了一个两阶段的训练范式来相应地增强策略模型。第一阶段，多任务深度学习框架利用故障标签信息训练具有归一化层的特征提取模块和信号重构任务，丰富特征表示。第二阶段，将预训练好的特征提取模块转移到权值固定的agent模型中，进行PPO训练。此外，为了提高样本的利用效率，提出了一种周期-圆移位数据增强方法，通过周期序列的数据循环重构来增强泛化能力。为了验证所提出的框架，以PSCAD/EMTDC软件生成的仿真数据作为训练数据，以实验变压器系统生成的实际数据作为测试数据，进行了一系列实验。实验结果表明，ECPPO算法的测试精度显著提高，达到99.19%，平均响应时间为12.10 ms，表明该算法不仅具有较高的精度，而且有效地缩短了平均响应时间。此外，从仿真系统过渡到实验系统时，该方法具有鲁棒的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.