HiNER:用于中文命名实体识别的分层特征融合

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shuxiang Hou , Yurong Qian , Jiaying Chen , Jigui Zhao , Huiyong Lv , Jiyuan Zhang , Hongyong Leng , Mengnan Ma
{"title":"HiNER:用于中文命名实体识别的分层特征融合","authors":"Shuxiang Hou ,&nbsp;Yurong Qian ,&nbsp;Jiaying Chen ,&nbsp;Jigui Zhao ,&nbsp;Huiyong Lv ,&nbsp;Jiyuan Zhang ,&nbsp;Hongyong Leng ,&nbsp;Mengnan Ma","doi":"10.1016/j.neucom.2024.128667","DOIUrl":null,"url":null,"abstract":"<div><div>Named Entity Recognition (NER) aims to extract structured entity information from unstructured textual data by identifying entity boundaries and categories. Chinese NER is more challenging than that of English due to the complex structure and ambiguous word boundaries, as well as nested and discontinuous occurrences of entities. Previous Chinese NER methods are limited by their character-based approach and dependence on external lexical information, which is often non-contextualized, leading to the introduction of noise and potentially compromising model performance. This paper proposes a novel Chinese NER model, HiNER, which leverages external semantic enhancement and hierarchical attention fusion. Specifically, we initially formulate the Chinese NER as a character–character relation classification task, thoroughly taking into account the cases of nested and discontinuous entities. Then, by incorporating syntactic information, we develop a Triformer module that is used to better integrate Chinese character, lexical, and syntactic embeddings, carefully considering the impact of external semantic enhancement on the original text embeddings and reducing extrinsic information interference to some extent. In addition, through the fusion of local and global attention mechanisms, the representation of character–character relationships is enhanced, allowing for the effective capture of semantic features at various hierarchical levels within the Chinese context. We conduct extensive experiments on seven Chinese NER datasets, and the results indicate that the HiNER model achieves state-of-the-art (SOTA) performance. The outcomes also confirm that external semantic enhancement and hierarchical attention fusion can provide better assistance in accomplishing the Chinese NER task.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HiNER: Hierarchical feature fusion for Chinese named entity recognition\",\"authors\":\"Shuxiang Hou ,&nbsp;Yurong Qian ,&nbsp;Jiaying Chen ,&nbsp;Jigui Zhao ,&nbsp;Huiyong Lv ,&nbsp;Jiyuan Zhang ,&nbsp;Hongyong Leng ,&nbsp;Mengnan Ma\",\"doi\":\"10.1016/j.neucom.2024.128667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Named Entity Recognition (NER) aims to extract structured entity information from unstructured textual data by identifying entity boundaries and categories. Chinese NER is more challenging than that of English due to the complex structure and ambiguous word boundaries, as well as nested and discontinuous occurrences of entities. Previous Chinese NER methods are limited by their character-based approach and dependence on external lexical information, which is often non-contextualized, leading to the introduction of noise and potentially compromising model performance. This paper proposes a novel Chinese NER model, HiNER, which leverages external semantic enhancement and hierarchical attention fusion. Specifically, we initially formulate the Chinese NER as a character–character relation classification task, thoroughly taking into account the cases of nested and discontinuous entities. Then, by incorporating syntactic information, we develop a Triformer module that is used to better integrate Chinese character, lexical, and syntactic embeddings, carefully considering the impact of external semantic enhancement on the original text embeddings and reducing extrinsic information interference to some extent. In addition, through the fusion of local and global attention mechanisms, the representation of character–character relationships is enhanced, allowing for the effective capture of semantic features at various hierarchical levels within the Chinese context. We conduct extensive experiments on seven Chinese NER datasets, and the results indicate that the HiNER model achieves state-of-the-art (SOTA) performance. The outcomes also confirm that external semantic enhancement and hierarchical attention fusion can provide better assistance in accomplishing the Chinese NER task.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224014383\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224014383","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

命名实体识别(NER)旨在通过识别实体边界和类别,从非结构化文本数据中提取结构化实体信息。与英文相比,中文 NER 具有结构复杂、词界模糊、实体嵌套和不连续出现等特点,因此更具挑战性。以往的中文 NER 方法受限于基于字符的方法和对外部词汇信息的依赖,而这些信息往往是非上下文化的,从而导致噪声的引入,并可能影响模型的性能。本文提出了一种新的中文 NER 模型 HiNER,它充分利用了外部语义增强和分层注意力融合。具体来说,我们首先将中文 NER 定义为字符关系分类任务,并充分考虑到嵌套实体和不连续实体的情况。然后,通过结合句法信息,我们开发了一个 Triformer 模块,用于更好地整合汉字、词法和句法嵌入,仔细考虑外部语义增强对原始文本嵌入的影响,并在一定程度上减少外在信息干扰。此外,通过融合局部和全局关注机制,增强了对字符-字符关系的表征,从而有效捕捉中文语境中不同层次的语义特征。我们在七个中文 NER 数据集上进行了广泛的实验,结果表明 HiNER 模型达到了最先进(SOTA)的性能。实验结果还证实,外部语义增强和分层注意力融合可以更好地帮助完成中文 NER 任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HiNER: Hierarchical feature fusion for Chinese named entity recognition
Named Entity Recognition (NER) aims to extract structured entity information from unstructured textual data by identifying entity boundaries and categories. Chinese NER is more challenging than that of English due to the complex structure and ambiguous word boundaries, as well as nested and discontinuous occurrences of entities. Previous Chinese NER methods are limited by their character-based approach and dependence on external lexical information, which is often non-contextualized, leading to the introduction of noise and potentially compromising model performance. This paper proposes a novel Chinese NER model, HiNER, which leverages external semantic enhancement and hierarchical attention fusion. Specifically, we initially formulate the Chinese NER as a character–character relation classification task, thoroughly taking into account the cases of nested and discontinuous entities. Then, by incorporating syntactic information, we develop a Triformer module that is used to better integrate Chinese character, lexical, and syntactic embeddings, carefully considering the impact of external semantic enhancement on the original text embeddings and reducing extrinsic information interference to some extent. In addition, through the fusion of local and global attention mechanisms, the representation of character–character relationships is enhanced, allowing for the effective capture of semantic features at various hierarchical levels within the Chinese context. We conduct extensive experiments on seven Chinese NER datasets, and the results indicate that the HiNER model achieves state-of-the-art (SOTA) performance. The outcomes also confirm that external semantic enhancement and hierarchical attention fusion can provide better assistance in accomplishing the Chinese NER task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信