Chinese Named Entity Recognition Based on BERT with Whole Word Masking

Chao Liu, Cui Zhu, Wenjun Zhu
{"title":"Chinese Named Entity Recognition Based on BERT with Whole Word Masking","authors":"Chao Liu, Cui Zhu, Wenjun Zhu","doi":"10.1145/3404555.3404563","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition (NER) is a basic task of natural language processing and an indispensable part of machine translation, knowledge mapping and other fields. In this paper, a fusion model of Chinese named entity recognition using BERT, Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) is proposed. In this model, Chinese BERT generates word vectors as a word embedding model. Word vectors through BiLSTM can learn the word label distribution. Finally, the model uses Conditional Random Fields to make syntactic restrictions at the sentence level to get annotation sequences. In addition, we can use Whole Word Masking (wwm) instead of the original random mask in BERT's pre-training, which can effectively solve the problem that the word in Chinese NER is partly masked, so as to improve the performance of NER model. In this paper, BERT-wwm (BERT-wwm is the BERT that uses Whole-Word-Masking in pre training tasks), BERT, ELMo and Word2Vec are respectively used for comparative experiments to reflect the effect of bert-wwm in this fusion model. The results show that using Chinese BERT-wwm as the language representation model of NER model has better recognition ability.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Named Entity Recognition (NER) is a basic task of natural language processing and an indispensable part of machine translation, knowledge mapping and other fields. In this paper, a fusion model of Chinese named entity recognition using BERT, Bidirectional LSTM (BiLSTM) and Conditional Random Field (CRF) is proposed. In this model, Chinese BERT generates word vectors as a word embedding model. Word vectors through BiLSTM can learn the word label distribution. Finally, the model uses Conditional Random Fields to make syntactic restrictions at the sentence level to get annotation sequences. In addition, we can use Whole Word Masking (wwm) instead of the original random mask in BERT's pre-training, which can effectively solve the problem that the word in Chinese NER is partly masked, so as to improve the performance of NER model. In this paper, BERT-wwm (BERT-wwm is the BERT that uses Whole-Word-Masking in pre training tasks), BERT, ELMo and Word2Vec are respectively used for comparative experiments to reflect the effect of bert-wwm in this fusion model. The results show that using Chinese BERT-wwm as the language representation model of NER model has better recognition ability.
基于BERT全词掩模的中文命名实体识别
命名实体识别(NER)是自然语言处理的一项基本任务,也是机器翻译、知识图谱等领域不可缺少的组成部分。提出了一种基于BERT、双向LSTM (BiLSTM)和条件随机场(CRF)的中文命名实体识别融合模型。在该模型中,中文BERT生成词向量作为词嵌入模型。通过BiLSTM的词向量可以学习到词的标签分布。最后,利用条件随机场在句子层面进行句法限制,得到标注序列。此外,我们可以在BERT的预训练中使用全词掩蔽(Whole Word Masking, wwm)来代替原来的随机掩码,可以有效地解决中文NER中单词部分被掩蔽的问题,从而提高NER模型的性能。本文分别使用BERT-wwm (BERT-wwm是在预训练任务中使用全词掩蔽的BERT)、BERT、ELMo和Word2Vec进行对比实验,以反映BERT-wwm在该融合模型中的效果。结果表明,使用中文BERT-wwm作为NER模型的语言表示模型具有更好的识别能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信