{"title":"Power Entity Information Recognition Method Based on Bi-LSTM+CRF","authors":"Junhua Hu, Chen Jiang, Guo-ming Ma, Jing Ding, Yawen Wang, Jiquan Xu, Yuan Wang","doi":"10.1109/AEERO52475.2021.9708243","DOIUrl":null,"url":null,"abstract":"During the long-term operation, power companies have accumulated a large amount of text-based data in the form of natural language. In these text-based data, entities are key semantic units, including faulty equipment, locations, phenomena, etc. The extraction of these entities is the basic work for the construction of electrical knowledge graph and faults diagnosis. However, due to the unstructured form of text-based data, it is difficult to extract entities automatically. This paper introduced a deep learning model for power entity information entity extraction, using each Chinese character as the basic unit, and employing the Word2vec model to convert each character into a vector containing contextual semantic information. Then the vector is introduced into the Bi-LSTM (Bidirectional-Long Short Term Memory) network model. In addition, the transition probability between them are comprehensively considered in the Conditional Random Field (CRF) layer. So the best tag sequence of the sentence are obtained. In the case analysis, the entities of 550 onsite text-based data was extracted. The F1 value of the extraction results was higher than that based on the power dictionary and matching scheme.","PeriodicalId":6828,"journal":{"name":"2021 International Conference on Advanced Electrical Equipment and Reliable Operation (AEERO)","volume":"51 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Advanced Electrical Equipment and Reliable Operation (AEERO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEERO52475.2021.9708243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
During the long-term operation, power companies have accumulated a large amount of text-based data in the form of natural language. In these text-based data, entities are key semantic units, including faulty equipment, locations, phenomena, etc. The extraction of these entities is the basic work for the construction of electrical knowledge graph and faults diagnosis. However, due to the unstructured form of text-based data, it is difficult to extract entities automatically. This paper introduced a deep learning model for power entity information entity extraction, using each Chinese character as the basic unit, and employing the Word2vec model to convert each character into a vector containing contextual semantic information. Then the vector is introduced into the Bi-LSTM (Bidirectional-Long Short Term Memory) network model. In addition, the transition probability between them are comprehensively considered in the Conditional Random Field (CRF) layer. So the best tag sequence of the sentence are obtained. In the case analysis, the entities of 550 onsite text-based data was extracted. The F1 value of the extraction results was higher than that based on the power dictionary and matching scheme.