{"title":"Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text","authors":"Mengyuan Zhang, Jin Wang, Xuejie Zhang","doi":"10.1109/ICEIEC49280.2020.9152257","DOIUrl":null,"url":null,"abstract":"The implementation of name entity recognition (NER) in Chinese clinic text is challenging. These methods have several limitations, such as the complexity of the medical text structure, the vast difference in entity length, and identical entities with different entity categories in different contexts. To address these problems, we propose a combination model of both pre-trained bi-directional long short-term memory (Bi- LSTM) and the conditional random field (CRF) model. Due to the specification of medical texts, we do not employ Chinese word segmentation tools. A character-level feature is introduced as an input feature, which is subsequently mapped into char embeddings by using an embedding layer of the bi-directional encoder representation from transformers (BERT) model. A BiLSTM layer and a CRF are utilized to encode the char embeddings and output the final label. The experiments are conducted with CNMER2019 to evaluate the performance and compared with several previous models. The results show that the proposed model outperformed other models and achieved better performance with NER in Chinese clinic text.","PeriodicalId":352285,"journal":{"name":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC49280.2020.9152257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The implementation of name entity recognition (NER) in Chinese clinic text is challenging. These methods have several limitations, such as the complexity of the medical text structure, the vast difference in entity length, and identical entities with different entity categories in different contexts. To address these problems, we propose a combination model of both pre-trained bi-directional long short-term memory (Bi- LSTM) and the conditional random field (CRF) model. Due to the specification of medical texts, we do not employ Chinese word segmentation tools. A character-level feature is introduced as an input feature, which is subsequently mapped into char embeddings by using an embedding layer of the bi-directional encoder representation from transformers (BERT) model. A BiLSTM layer and a CRF are utilized to encode the char embeddings and output the final label. The experiments are conducted with CNMER2019 to evaluate the performance and compared with several previous models. The results show that the proposed model outperformed other models and achieved better performance with NER in Chinese clinic text.