{"title":"Domain-adaptation-based named entity recognition with information enrichment for equipment fault knowledge graph","authors":"Dengrui Xiong, Xinyu Li, Liang Gao, Yiping Gao","doi":"10.1049/cim2.70003","DOIUrl":null,"url":null,"abstract":"<p>Numerous files, such as records and logs, are generated in the process of equipment diagnosis and maintenance (D&M). These files contain lots of unstructured plain text. Knowledge in these files could be reused for similar equipment faults. In practice, knowledge presented in plain text is hard to acquire. Thus, automated named entity recognition (NER) and relation extraction (RE) methods based on pretrained encoders could be used to extract entities and relations and develop a structured knowledge graph (KG), thus facilitating intelligent manufacturing. However, equipment fault NER exhibits suboptimal performance with existing encoders pretrained on general-domain corpus. In this paper, domain-adaptation-based NER with information enrichment is proposed for developing an equipment fault KG. A domain-adapted encoder is tailored for equipment fault NER through domain-adaptive pretraining (DAPT). Update of word segmentation dictionary and adjustment of masking approach are implemented during DAPT for information enrichment, which helps make the most of the limited domain-specific pretraining corpus. Experimental results show that the F1 score of NER is improved by 1.22% using the domain-adapted encoder compared to its counterpart using the encoder pretrained on general-domain corpus. Furthermore, a reliable and robust question answering (QA) application of the developed equipment fault KG is also shown.</p>","PeriodicalId":33286,"journal":{"name":"IET Collaborative Intelligent Manufacturing","volume":"6 4","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cim2.70003","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Collaborative Intelligent Manufacturing","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cim2.70003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
Numerous files, such as records and logs, are generated in the process of equipment diagnosis and maintenance (D&M). These files contain lots of unstructured plain text. Knowledge in these files could be reused for similar equipment faults. In practice, knowledge presented in plain text is hard to acquire. Thus, automated named entity recognition (NER) and relation extraction (RE) methods based on pretrained encoders could be used to extract entities and relations and develop a structured knowledge graph (KG), thus facilitating intelligent manufacturing. However, equipment fault NER exhibits suboptimal performance with existing encoders pretrained on general-domain corpus. In this paper, domain-adaptation-based NER with information enrichment is proposed for developing an equipment fault KG. A domain-adapted encoder is tailored for equipment fault NER through domain-adaptive pretraining (DAPT). Update of word segmentation dictionary and adjustment of masking approach are implemented during DAPT for information enrichment, which helps make the most of the limited domain-specific pretraining corpus. Experimental results show that the F1 score of NER is improved by 1.22% using the domain-adapted encoder compared to its counterpart using the encoder pretrained on general-domain corpus. Furthermore, a reliable and robust question answering (QA) application of the developed equipment fault KG is also shown.
在设备诊断和维护 (D&M) 过程中会产生大量文件,如记录和日志。这些文件包含大量非结构化的纯文本。这些文件中的知识可重复用于类似的设备故障。实际上,以纯文本形式呈现的知识很难获取。因此,可以使用基于预训练编码器的自动命名实体识别(NER)和关系提取(RE)方法来提取实体和关系,并开发结构化知识图谱(KG),从而促进智能制造。然而,现有编码器在通用领域语料库上进行预训练后,设备故障 NER 的性能并不理想。本文提出了基于领域适应的 NER 方法,该方法具有信息富集功能,可用于开发设备故障知识图谱。通过领域自适应预训练(DAPT),为设备故障 NER 定制了领域自适应编码器。在 DAPT 期间更新分词字典和调整掩码方法以丰富信息,这有助于充分利用有限的特定领域预训练语料。实验结果表明,与使用通用语料库预训练的编码器相比,使用领域适应编码器的 NER F1 分数提高了 1.22%。此外,还展示了所开发的设备故障 KG 在问题解答(QA)中的可靠和稳健应用。
期刊介绍:
IET Collaborative Intelligent Manufacturing is a Gold Open Access journal that focuses on the development of efficient and adaptive production and distribution systems. It aims to meet the ever-changing market demands by publishing original research on methodologies and techniques for the application of intelligence, data science, and emerging information and communication technologies in various aspects of manufacturing, such as design, modeling, simulation, planning, and optimization of products, processes, production, and assembly.
The journal is indexed in COMPENDEX (Elsevier), Directory of Open Access Journals (DOAJ), Emerging Sources Citation Index (Clarivate Analytics), INSPEC (IET), SCOPUS (Elsevier) and Web of Science (Clarivate Analytics).