基于领域词典与CRF结合的双层标注模型的中文电子病历临床命名实体识别

IF 4.5 Q4 Engineering

工程设计学报 Pub Date : 2020-04-01 DOI:10.13374/J.ISSN2095-9389.2019.09.04.004

龚乐君, 张知菲

{"title":"基于领域词典与CRF结合的双层标注模型的中文电子病历临床命名实体识别","authors":"龚乐君, 张知菲","doi":"10.13374/J.ISSN2095-9389.2019.09.04.004","DOIUrl":null,"url":null,"abstract":"As a document recorded by professional medical personnel, electronic medical records contain a large and important clinical resource. How to use a large amount of potential information in electronic medical records has become one of the major research directions. Chinese electronic medical records are knowledge-intensive, in which the data has considerable research value. However,they have more complex entities because of the language features of Chinese, and the composite entity is long. These sentences components in the text are missing. Moreover, the boundaries of clinical entities are often unclear. Labeling corpus is a job that requires a great deal of manpower because of the technical language used in a given text. Therefore, the recognition of Chinese clinical named entities is a hard problem. Considering these characteristics of Chinese electronic medical records, this paper proposed a double-layer annotation model that combined with a domain dictionary and conditional random field(CRF). A medical domain dictionary was constructed by statistical analysis method, and combined with CRF to mark two different granularity labeling operations. The manually constructed medical domain dictionary has extremely high accuracy for the recognition of registered words, and machine learning could automatically recognize unregistered words. This work integrated the two aspects based on these advantages. With the proposed method, diseases, symptoms, drugs, and operations could be recognized from Chinese electronic medical records. Using the test dataset, the Macro-P with 96.7%,the Macro-R with 97.7%and the Macro-F1 with 97.2%were obtained.The recognition performance of the proposed method was greatly improved compared with that of a single-layer model.The recognition effect of deep neural network with attention was also analyzed,which did not perform well due to the size of the domain dataset.The experimental results show the efficiency of the double-layer annotation model for the named entity recognition of Chinese electronic medical records.","PeriodicalId":31263,"journal":{"name":"工程设计学报","volume":"64 1","pages":"469-475"},"PeriodicalIF":4.5000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF\",\"authors\":\"龚乐君, 张知菲\",\"doi\":\"10.13374/J.ISSN2095-9389.2019.09.04.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a document recorded by professional medical personnel, electronic medical records contain a large and important clinical resource. How to use a large amount of potential information in electronic medical records has become one of the major research directions. Chinese electronic medical records are knowledge-intensive, in which the data has considerable research value. However,they have more complex entities because of the language features of Chinese, and the composite entity is long. These sentences components in the text are missing. Moreover, the boundaries of clinical entities are often unclear. Labeling corpus is a job that requires a great deal of manpower because of the technical language used in a given text. Therefore, the recognition of Chinese clinical named entities is a hard problem. Considering these characteristics of Chinese electronic medical records, this paper proposed a double-layer annotation model that combined with a domain dictionary and conditional random field(CRF). A medical domain dictionary was constructed by statistical analysis method, and combined with CRF to mark two different granularity labeling operations. The manually constructed medical domain dictionary has extremely high accuracy for the recognition of registered words, and machine learning could automatically recognize unregistered words. This work integrated the two aspects based on these advantages. With the proposed method, diseases, symptoms, drugs, and operations could be recognized from Chinese electronic medical records. Using the test dataset, the Macro-P with 96.7%,the Macro-R with 97.7%and the Macro-F1 with 97.2%were obtained.The recognition performance of the proposed method was greatly improved compared with that of a single-layer model.The recognition effect of deep neural network with attention was also analyzed,which did not perform well due to the size of the domain dataset.The experimental results show the efficiency of the double-layer annotation model for the named entity recognition of Chinese electronic medical records.\",\"PeriodicalId\":31263,\"journal\":{\"name\":\"工程设计学报\",\"volume\":\"64 1\",\"pages\":\"469-475\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"工程设计学报\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://doi.org/10.13374/J.ISSN2095-9389.2019.09.04.004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"工程设计学报","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.13374/J.ISSN2095-9389.2019.09.04.004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 1

摘要

电子病历是由专业医务人员记录的一种文件，包含着大量重要的临床资源。如何利用电子病历中大量的潜在信息已成为主要的研究方向之一。中国电子病历是知识密集型的，其数据具有相当大的研究价值。但由于汉语的语言特点，它们的实体比较复杂，复合实体比较长。课文中缺少这些句子成分。此外，临床实体的界限往往不明确。由于给定文本中使用的技术语言，标记语料库是一项需要大量人力的工作。因此，中文临床命名实体的识别是一个难题。针对中文电子病历的这些特点，提出了一种结合领域字典和条件随机场(CRF)的双层标注模型。采用统计分析方法构建医学领域词典，并结合CRF对两种不同粒度标注操作进行标注。人工构建的医学领域词典对注册词的识别准确率极高，机器学习可以自动识别未注册词。本工作正是基于这些优势，将这两方面进行了整合。利用该方法，可以从中文电子病历中识别疾病、症状、药物和手术。使用测试数据集，得到Macro-P为96.7%，Macro-R为97.7%，Macro-F1为97.2%。与单层模型相比，该方法的识别性能有很大提高。同时分析了带注意力的深度神经网络的识别效果，由于领域数据集的大小，深度神经网络的识别效果并不理想。实验结果表明了双层标注模型对中文电子病历命名实体识别的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF

As a document recorded by professional medical personnel, electronic medical records contain a large and important clinical resource. How to use a large amount of potential information in electronic medical records has become one of the major research directions. Chinese electronic medical records are knowledge-intensive, in which the data has considerable research value. However,they have more complex entities because of the language features of Chinese, and the composite entity is long. These sentences components in the text are missing. Moreover, the boundaries of clinical entities are often unclear. Labeling corpus is a job that requires a great deal of manpower because of the technical language used in a given text. Therefore, the recognition of Chinese clinical named entities is a hard problem. Considering these characteristics of Chinese electronic medical records, this paper proposed a double-layer annotation model that combined with a domain dictionary and conditional random field(CRF). A medical domain dictionary was constructed by statistical analysis method, and combined with CRF to mark two different granularity labeling operations. The manually constructed medical domain dictionary has extremely high accuracy for the recognition of registered words, and machine learning could automatically recognize unregistered words. This work integrated the two aspects based on these advantages. With the proposed method, diseases, symptoms, drugs, and operations could be recognized from Chinese electronic medical records. Using the test dataset, the Macro-P with 96.7%,the Macro-R with 97.7%and the Macro-F1 with 97.2%were obtained.The recognition performance of the proposed method was greatly improved compared with that of a single-layer model.The recognition effect of deep neural network with attention was also analyzed,which did not perform well due to the size of the domain dataset.The experimental results show the efficiency of the double-layer annotation model for the named entity recognition of Chinese electronic medical records.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

工程设计学报 Engineering-Engineering (miscellaneous)

CiteScore

0.60

自引率

0.00%

发文量

2447

审稿时长

14 weeks

期刊介绍： Chinese Journal of Engineering Design is a reputable journal published by Zhejiang University Press Co., Ltd. It was founded in December, 1994 as the first internationally cooperative journal in the area of engineering design research. Administrated by the Ministry of Education of China, it is sponsored by both Zhejiang University and Chinese Society of Mechanical Engineering. Zhejiang University Press Co., Ltd. is fully responsible for its bimonthly domestic and oversea publication. Its page is in A4 size. This journal is devoted to reporting most up-to-date achievements of engineering design researches and therefore, to promote the communications of academic researches and their applications to industry. Achievments of great creativity and practicablity are extraordinarily desirable. Aiming at supplying designers, developers and researchers of diversified technical artifacts with valuable references, its content covers all aspects of design theory and methodology, as well as its enabling environment, for instance, creative design, concurrent design, conceptual design, intelligent design, web-based design, reverse engineering design, industrial design, design optimization, tribology, design by biological analogy, virtual reality in design, structural analysis and design, design knowledge representation, design knowledge management, design decision-making systems, etc.