Text classification for distribution substation inspection based on BERT-TextRCNN model

IF 2.6 4区工程技术 Q3 ENERGY & FUELS

Frontiers in Energy Research Pub Date : 2024-07-31 DOI:10.3389/fenrg.2024.1411654

Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting

{"title":"Text classification for distribution substation inspection based on BERT-TextRCNN model","authors":"Lu Jiangang, Zhao Ruifeng, Yu Zhiwen, Dai Yue, Shu Jiawei, Yang Ting","doi":"10.3389/fenrg.2024.1411654","DOIUrl":null,"url":null,"abstract":"With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.","PeriodicalId":12428,"journal":{"name":"Frontiers in Energy Research","volume":"213 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Energy Research","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3389/fenrg.2024.1411654","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}

引用次数: 0

Abstract

With the advancement of source-load interaction in the new power systems, data-driven approaches have provided a foundational support for aggregating and interacting between sources and loads. However, with the widespread integration of distributed energy resources, fine-grained perception of intelligent sensing devices, and the inherent stochasticity of source-load dynamics, a massive amount of raw data is being recorded and accumulated in the data center. Valuable information is often dispersed across different paragraphs of the raw data, making it challenging to extract effectively. Distribution substation inspection plays a crucial role in ensuring the safe operation of the power system. Traditional methods for inspection report text classification typically rely on manual judgment and accumulated experience, resulting in low efficiency and a significant misjudgment rate. Therefore, this paper proposes a text classification method for inspection reports based on the pre-trained BERT-TextRCNN model. By utilizing the dense connection between the BERT embedding layer and the neural network, the proposed method improves the accuracy of matching long texts. This article collected 2,831 maintenance data for the first quarter of 2023 from the distribution room, including approximately 58 environmental testing data, 738 environmental box testing data, approximately 672 distribution room testing data, and approximately 1,363 box type substation testing data. A text corpus was constructed for experiments. Experimental results demonstrate that the proposed model automatically classifies a large volume of manually recorded inspection report data based on time, location, and faults, achieving a classification accuracy of 94.7%, precision of 92%, recall of 92%, and F1 score of 90.3%.

查看原文本刊更多论文

基于 BERT-TextRCNN 模型的配电变电站检测文本分类

随着新电力系统中源与负载互动的发展，数据驱动方法为源与负载之间的聚合和互动提供了基础支持。然而，随着分布式能源资源的广泛集成、智能传感设备的细粒度感知以及源-负载动态的内在随机性，大量原始数据被记录并积累到数据中心。有价值的信息往往分散在原始数据的不同段落中，因此要有效提取这些信息非常困难。配电变电站检测在确保电力系统的安全运行方面发挥着至关重要的作用。传统的巡检报告文本分类方法通常依赖人工判断和经验积累，效率低且误判率高。因此，本文提出了一种基于预训练 BERT-TextRCNN 模型的检验报告文本分类方法。通过利用 BERT 嵌入层与神经网络之间的密集连接，本文提出的方法提高了长文本匹配的准确性。本文收集了配电室 2023 年第一季度的 2831 条维护数据，包括约 58 条环境检测数据、738 条环境箱式检测数据、约 672 条配电室检测数据和约 1363 条箱式变电站检测数据。为实验构建了文本语料库。实验结果表明，所提出的模型能根据时间、地点和故障对大量人工记录的检测报告数据进行自动分类，分类准确率达到 94.7%，精确率达到 92%，召回率达到 92%，F1 分数达到 90.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Energy Research Economics, Econometrics and Finance-Economics and Econometrics

CiteScore

3.90

自引率

11.80%

发文量

1727

审稿时长

12 weeks

期刊介绍： Frontiers in Energy Research makes use of the unique Frontiers platform for open-access publishing and research networking for scientists, which provides an equal opportunity to seek, share and create knowledge. The mission of Frontiers is to place publishing back in the hands of working scientists and to promote an interactive, fair, and efficient review process. Articles are peer-reviewed according to the Frontiers review guidelines, which evaluate manuscripts on objective editorial criteria