Named Entity Recognition Incorporating Chinese Segmentation Information

2021 International Conference on Advanced Computing and Endogenous Security Pub Date : 2022-04-21 DOI:10.1109/IEEECONF52377.2022.10013348

Shiyuan Yu, Shuming Guo, Ruiyang Huang, Jianpeng Zhang, Ke Su, Nan Hu

引用次数: 0

Abstract

Word-level information is crucial for Chinese named entity recognition. Presently, most works have achieved better performance by extracting word-level information into character-level representations through existing lexicons, but the maintenance of lexical lists is a major challenge. In this paper, we present the NIMSI model, proposing the incorporation of multiple segmentation information to enhance recognition, using a trilogy to align character-level attention with word-level attention to construct features of segmented information in Chinese text. Also, we use a simple but effective method to directly incorporate multi-segmentation information into character-level representations. Finally, as the experiments on the three benchmark datasets show, our model effectively incorporates segmentation information and alleviates the segmentation errors.

查看原文本刊更多论文

基于中文分词信息的命名实体识别

词级信息对中文命名实体识别至关重要。目前，大多数作品都是利用已有的词汇将词级信息提取为字符级表示，取得了较好的效果，但词汇表的维护是一个主要的挑战。在本文中，我们提出了NIMSI模型，提出了结合多个分词信息来增强识别，使用三部曲对齐字级注意和词级注意来构建中文文本中分词信息的特征。此外，我们还使用了一种简单而有效的方法将多段信息直接合并到字符级表示中。最后，在三个基准数据集上的实验表明，我们的模型有效地融合了分割信息，减轻了分割误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Advanced Computing and Endogenous Security

自引率

0.00%

发文量