Power Entity Information Recognition Method Based on Bi-LSTM+CRF

Junhua Hu, Chen Jiang, Guo-ming Ma, Jing Ding, Yawen Wang, Jiquan Xu, Yuan Wang
{"title":"Power Entity Information Recognition Method Based on Bi-LSTM+CRF","authors":"Junhua Hu, Chen Jiang, Guo-ming Ma, Jing Ding, Yawen Wang, Jiquan Xu, Yuan Wang","doi":"10.1109/AEERO52475.2021.9708243","DOIUrl":null,"url":null,"abstract":"During the long-term operation, power companies have accumulated a large amount of text-based data in the form of natural language. In these text-based data, entities are key semantic units, including faulty equipment, locations, phenomena, etc. The extraction of these entities is the basic work for the construction of electrical knowledge graph and faults diagnosis. However, due to the unstructured form of text-based data, it is difficult to extract entities automatically. This paper introduced a deep learning model for power entity information entity extraction, using each Chinese character as the basic unit, and employing the Word2vec model to convert each character into a vector containing contextual semantic information. Then the vector is introduced into the Bi-LSTM (Bidirectional-Long Short Term Memory) network model. In addition, the transition probability between them are comprehensively considered in the Conditional Random Field (CRF) layer. So the best tag sequence of the sentence are obtained. In the case analysis, the entities of 550 onsite text-based data was extracted. The F1 value of the extraction results was higher than that based on the power dictionary and matching scheme.","PeriodicalId":6828,"journal":{"name":"2021 International Conference on Advanced Electrical Equipment and Reliable Operation (AEERO)","volume":"51 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Advanced Electrical Equipment and Reliable Operation (AEERO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEERO52475.2021.9708243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

During the long-term operation, power companies have accumulated a large amount of text-based data in the form of natural language. In these text-based data, entities are key semantic units, including faulty equipment, locations, phenomena, etc. The extraction of these entities is the basic work for the construction of electrical knowledge graph and faults diagnosis. However, due to the unstructured form of text-based data, it is difficult to extract entities automatically. This paper introduced a deep learning model for power entity information entity extraction, using each Chinese character as the basic unit, and employing the Word2vec model to convert each character into a vector containing contextual semantic information. Then the vector is introduced into the Bi-LSTM (Bidirectional-Long Short Term Memory) network model. In addition, the transition probability between them are comprehensively considered in the Conditional Random Field (CRF) layer. So the best tag sequence of the sentence are obtained. In the case analysis, the entities of 550 onsite text-based data was extracted. The F1 value of the extraction results was higher than that based on the power dictionary and matching scheme.
基于Bi-LSTM+CRF的电力实体信息识别方法
在长期运行过程中,电力公司以自然语言的形式积累了大量基于文本的数据。在这些基于文本的数据中,实体是关键的语义单位,包括故障设备、位置、现象等。这些实体的提取是构建电气知识图谱和进行故障诊断的基础工作。然而,由于文本数据的非结构化形式,难以自动提取实体。本文提出了一种电力实体信息实体提取的深度学习模型,以每个汉字为基本单位,利用Word2vec模型将每个汉字转换成包含上下文语义信息的向量。然后将向量引入到双向长短期记忆网络模型中。此外,在条件随机场(CRF)层中还综合考虑了它们之间的转移概率。从而得到句子的最佳标签序列。在案例分析中,提取了550个现场文本数据的实体。提取结果的F1值高于基于功率字典和匹配方案的提取结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信