{"title":"Chinese Named Entity Recognition Based on BERT with Whole Word Masking","authors":"Chao Liu, Cui Zhu, Wenjun Zhu","doi":"10.1145/3404555.3404563","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition (NER) is a basic task of natural language processing and an indispensable part of machine translation, knowledge mapping and other fields. In this paper, a fusion model of Chinese named entity recognition using BERT, Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) is proposed. In this model, Chinese BERT generates word vectors as a word embedding model. Word vectors through BiLSTM can learn the word label distribution. Finally, the model uses Conditional Random Fields to make syntactic restrictions at the sentence level to get annotation sequences. In addition, we can use Whole Word Masking (wwm) instead of the original random mask in BERT's pre-training, which can effectively solve the problem that the word in Chinese NER is partly masked, so as to improve the performance of NER model. In this paper, BERT-wwm (BERT-wwm is the BERT that uses Whole-Word-Masking in pre training tasks), BERT, ELMo and Word2Vec are respectively used for comparative experiments to reflect the effect of bert-wwm in this fusion model. The results show that using Chinese BERT-wwm as the language representation model of NER model has better recognition ability.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Named Entity Recognition (NER) is a basic task of natural language processing and an indispensable part of machine translation, knowledge mapping and other fields. In this paper, a fusion model of Chinese named entity recognition using BERT, Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) is proposed. In this model, Chinese BERT generates word vectors as a word embedding model. Word vectors through BiLSTM can learn the word label distribution. Finally, the model uses Conditional Random Fields to make syntactic restrictions at the sentence level to get annotation sequences. In addition, we can use Whole Word Masking (wwm) instead of the original random mask in BERT's pre-training, which can effectively solve the problem that the word in Chinese NER is partly masked, so as to improve the performance of NER model. In this paper, BERT-wwm (BERT-wwm is the BERT that uses Whole-Word-Masking in pre training tasks), BERT, ELMo and Word2Vec are respectively used for comparative experiments to reflect the effect of bert-wwm in this fusion model. The results show that using Chinese BERT-wwm as the language representation model of NER model has better recognition ability.