Named-entity recognition method of key population information based on improved BiLSTM-CRF model

Bi-Jun Ren, Fanliang Bu, Yihui Fu, Zhiwen Hou
{"title":"Named-entity recognition method of key population information based on improved BiLSTM-CRF model","authors":"Bi-Jun Ren, Fanliang Bu, Yihui Fu, Zhiwen Hou","doi":"10.1109/ICAICA52286.2021.9497963","DOIUrl":null,"url":null,"abstract":"Key population control is related to national security and social stability. Aiming at the difficulty of extracting key population information entities, this paper proposes a sequence labeling method based on dynamic word vectors. Replace the Word2vec pre-training model with the BERT model, which strengthens the feature extraction capabilities of the traditional entity extraction model, and more fully describes the multiple semantic and syntactic information of words. The improvement ideas for the BiLSTM-CRF model are as follows: Embed the BERT model upstream of the model, which is responsible for converting the original corpus into a dynamic vectorized representation, and the trained word vector is input into the BiLSTM layer for semantic encoding, and further mining the semantic related features of the entity context, and finally, the CRF layer outputs the sequence label with the maximum probability. After training with key population information as a data set, the F1 value of the model reached 0.90.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9497963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Key population control is related to national security and social stability. Aiming at the difficulty of extracting key population information entities, this paper proposes a sequence labeling method based on dynamic word vectors. Replace the Word2vec pre-training model with the BERT model, which strengthens the feature extraction capabilities of the traditional entity extraction model, and more fully describes the multiple semantic and syntactic information of words. The improvement ideas for the BiLSTM-CRF model are as follows: Embed the BERT model upstream of the model, which is responsible for converting the original corpus into a dynamic vectorized representation, and the trained word vector is input into the BiLSTM layer for semantic encoding, and further mining the semantic related features of the entity context, and finally, the CRF layer outputs the sequence label with the maximum probability. After training with key population information as a data set, the F1 value of the model reached 0.90.
基于改进BiLSTM-CRF模型的关键种群信息命名实体识别方法
重点人口控制关系到国家安全和社会稳定。针对关键种群信息实体提取困难的问题,提出了一种基于动态词向量的序列标注方法。用BERT模型代替Word2vec预训练模型,增强了传统实体提取模型的特征提取能力,更全面地描述了词的多重语义和句法信息。对BiLSTM-CRF模型的改进思想如下:在模型的上游嵌入BERT模型,BERT模型负责将原始语料库转换为动态矢量化表示,将训练好的词向量输入到BiLSTM层进行语义编码,并进一步挖掘实体上下文的语义相关特征,最后由CRF层以最大概率输出序列标签。以关键种群信息为数据集进行训练后,模型的F1值达到0.90。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信