{"title":"基于BIC模型的中文电子病历实体标注新方法","authors":"Yifan Wang, Guowei Teng, Xuehai Ding, Guoqing Zhang, Yunchao Ling, Guozhong Wang","doi":"10.17706/jsw.16.1.24-38","DOIUrl":null,"url":null,"abstract":"In the field of bio-medicine, mass data are generated every day, such as Chinese electronic medical record (EMR), containing massive medical terminology and specific categories of entities. The way to analyze and obtain effective information from these sparse data is a difficulty in research. As the foundation of analyzing huge amount of biomedical text data, Named Entity Recognition (NER) is essential in Natural Language Processing (NLP) complementing with effective labeling data. One of the two basic sequence labeling methods is rule-based bulk corpus tagging, requiring domain experts to establish targeted recognition rule base. However, in the application field, this method is single, and the portability does not make the expectation, bringing great limitations; The other is complete manual labeling, but it is time-consuming and laborious. Based on Bidirectional Long Short-Term Memory network (BiLSTM), Iterated Dilated Convolution Neural Network (IDCNN) and Conditional Random Field (CRF), we proposed the BIC model. This paper proposes a method for EMR entity labeling based on BIC model, realizing automatic annotation of Chinese EMR data. Machine labeling data can be used after manual review, which will reduce the workload of manual labeling bestially. Compared with other models, F1 value of BIC model reached 91.90% in CCKS2017 dataset, and 78% in PACS report data. Experiments show that our method is superior to the others.","PeriodicalId":11452,"journal":{"name":"e Informatica Softw. Eng. J.","volume":"39 1","pages":"24-38"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Method of Chinese Electronic Medical Records Entity Labeling Based on BIC model\",\"authors\":\"Yifan Wang, Guowei Teng, Xuehai Ding, Guoqing Zhang, Yunchao Ling, Guozhong Wang\",\"doi\":\"10.17706/jsw.16.1.24-38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of bio-medicine, mass data are generated every day, such as Chinese electronic medical record (EMR), containing massive medical terminology and specific categories of entities. The way to analyze and obtain effective information from these sparse data is a difficulty in research. As the foundation of analyzing huge amount of biomedical text data, Named Entity Recognition (NER) is essential in Natural Language Processing (NLP) complementing with effective labeling data. One of the two basic sequence labeling methods is rule-based bulk corpus tagging, requiring domain experts to establish targeted recognition rule base. However, in the application field, this method is single, and the portability does not make the expectation, bringing great limitations; The other is complete manual labeling, but it is time-consuming and laborious. Based on Bidirectional Long Short-Term Memory network (BiLSTM), Iterated Dilated Convolution Neural Network (IDCNN) and Conditional Random Field (CRF), we proposed the BIC model. This paper proposes a method for EMR entity labeling based on BIC model, realizing automatic annotation of Chinese EMR data. Machine labeling data can be used after manual review, which will reduce the workload of manual labeling bestially. Compared with other models, F1 value of BIC model reached 91.90% in CCKS2017 dataset, and 78% in PACS report data. Experiments show that our method is superior to the others.\",\"PeriodicalId\":11452,\"journal\":{\"name\":\"e Informatica Softw. Eng. J.\",\"volume\":\"39 1\",\"pages\":\"24-38\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"e Informatica Softw. Eng. J.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17706/jsw.16.1.24-38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"e Informatica Softw. Eng. J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17706/jsw.16.1.24-38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Method of Chinese Electronic Medical Records Entity Labeling Based on BIC model
In the field of bio-medicine, mass data are generated every day, such as Chinese electronic medical record (EMR), containing massive medical terminology and specific categories of entities. The way to analyze and obtain effective information from these sparse data is a difficulty in research. As the foundation of analyzing huge amount of biomedical text data, Named Entity Recognition (NER) is essential in Natural Language Processing (NLP) complementing with effective labeling data. One of the two basic sequence labeling methods is rule-based bulk corpus tagging, requiring domain experts to establish targeted recognition rule base. However, in the application field, this method is single, and the portability does not make the expectation, bringing great limitations; The other is complete manual labeling, but it is time-consuming and laborious. Based on Bidirectional Long Short-Term Memory network (BiLSTM), Iterated Dilated Convolution Neural Network (IDCNN) and Conditional Random Field (CRF), we proposed the BIC model. This paper proposes a method for EMR entity labeling based on BIC model, realizing automatic annotation of Chinese EMR data. Machine labeling data can be used after manual review, which will reduce the workload of manual labeling bestially. Compared with other models, F1 value of BIC model reached 91.90% in CCKS2017 dataset, and 78% in PACS report data. Experiments show that our method is superior to the others.