{"title":"Research on Application of Named Entity Recognition of Electronic Medical Records Based on BERT-IDCNN-CRF Model","authors":"Xiaocheng Cai, Erhua Sun, Jiali Lei","doi":"10.1145/3561518.3561531","DOIUrl":null,"url":null,"abstract":"Bi-LSTM-CRF (Bi-Directional Long Short-Term Memory Conditional Random Field) model have good performance in Chinese medical Electronic Medical Records (EMRS) Named Entity Recognition (NER), However, Bi-LSTM-CRF model cannot make full use of the parallelism of GPU (Graphics Processing Unit) in massive medical records, and the neglect of word order features and semantic information in IDCNN(Iterated Dilated Convolutional Neural Networks) model leads to poor NER effect. Therefore, this paper proposes a BERT-IDCNN-CRF model. In this model, the two-way transformer pre training model BERT is used to fine tune the model parameters in the manual annotated corpus conforming to the BIOES (Begin Inside Outside End Single) standard. The text is learned in an unsupervised manner, and the semantic information of words is represented by word vectors, which can well represent the context semantics in the sentences of EMRS; The state characteristics of character sequences are learned through BERT model, and the sequence state scores obtained are input to the CRF layer. The CRF layer makes constraint optimization on the sequence state transition, and IDCNN has better recognition effect on convolutional coding of local entities. Experimental test results: the average accuracy, recall and F1 value of the BERT-IDCNN-CRF model are 94.5%, 93.8% and 94.1% respectively, which are increased by 4.8%, 4.3% and 3.6% respectively compared with the baseline model Word2Vec-BiLSTM-CRF. The experiment proves that the BERT-IDCNN-CRF model can better identify medical entities in electronic medical records.","PeriodicalId":196224,"journal":{"name":"Proceedings of the 6th International Conference on Graphics and Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Graphics and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3561518.3561531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Bi-LSTM-CRF (Bi-Directional Long Short-Term Memory Conditional Random Field) model have good performance in Chinese medical Electronic Medical Records (EMRS) Named Entity Recognition (NER), However, Bi-LSTM-CRF model cannot make full use of the parallelism of GPU (Graphics Processing Unit) in massive medical records, and the neglect of word order features and semantic information in IDCNN(Iterated Dilated Convolutional Neural Networks) model leads to poor NER effect. Therefore, this paper proposes a BERT-IDCNN-CRF model. In this model, the two-way transformer pre training model BERT is used to fine tune the model parameters in the manual annotated corpus conforming to the BIOES (Begin Inside Outside End Single) standard. The text is learned in an unsupervised manner, and the semantic information of words is represented by word vectors, which can well represent the context semantics in the sentences of EMRS; The state characteristics of character sequences are learned through BERT model, and the sequence state scores obtained are input to the CRF layer. The CRF layer makes constraint optimization on the sequence state transition, and IDCNN has better recognition effect on convolutional coding of local entities. Experimental test results: the average accuracy, recall and F1 value of the BERT-IDCNN-CRF model are 94.5%, 93.8% and 94.1% respectively, which are increased by 4.8%, 4.3% and 3.6% respectively compared with the baseline model Word2Vec-BiLSTM-CRF. The experiment proves that the BERT-IDCNN-CRF model can better identify medical entities in electronic medical records.