MISS: Multiple information span scoring for Chinese named entity recognition

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Liyi Yang , Shuli Xing , Guojun Mao
{"title":"MISS: Multiple information span scoring for Chinese named entity recognition","authors":"Liyi Yang ,&nbsp;Shuli Xing ,&nbsp;Guojun Mao","doi":"10.1016/j.csl.2025.101783","DOIUrl":null,"url":null,"abstract":"<div><div>Named entity recognition (NER) has drawn much attention from researchers. In Chinese text, characters carry rich contextual and regularity-based information. In most previous works on Chinese NER, a model excavates boundary features of phrase spans, yet the token information within spans and relationship between adjacent spans are neglected, which leads to insufficient feature representations and thereby limits model performance. In this study, we construct a span-based NER model named MISS (Multiple Information Span Scoring). The model consists of two major modules: (1) a span extractor for type-independent entity extraction, where the relative position information is introduced into sequence representations; and (2) a span classifier that fuses boundary and internal information into span representations for enhanced span scoring. In the span classifier, we also employ a convolutional layer to conduct cross-span interaction, which rectifies the classification scores. Entity predictions are decoded from the sum of scores computed by two modules. Our method is simple and effective. Without any external resources, MISS achieves considerable improvement on four benchmark datasets. Moreover, the ablation experiments have demonstrated the effectiveness of each component in our model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"92 ","pages":"Article 101783"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000087","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Named entity recognition (NER) has drawn much attention from researchers. In Chinese text, characters carry rich contextual and regularity-based information. In most previous works on Chinese NER, a model excavates boundary features of phrase spans, yet the token information within spans and relationship between adjacent spans are neglected, which leads to insufficient feature representations and thereby limits model performance. In this study, we construct a span-based NER model named MISS (Multiple Information Span Scoring). The model consists of two major modules: (1) a span extractor for type-independent entity extraction, where the relative position information is introduced into sequence representations; and (2) a span classifier that fuses boundary and internal information into span representations for enhanced span scoring. In the span classifier, we also employ a convolutional layer to conduct cross-span interaction, which rectifies the classification scores. Entity predictions are decoded from the sum of scores computed by two modules. Our method is simple and effective. Without any external resources, MISS achieves considerable improvement on four benchmark datasets. Moreover, the ablation experiments have demonstrated the effectiveness of each component in our model.
中文命名实体识别的多信息广度评分
命名实体识别(NER)受到了研究者的广泛关注。在汉语文本中,汉字承载着丰富的语境信息和规则信息。以往的中文NER研究大多是挖掘短语跨的边界特征,而忽略了跨内的token信息和相邻跨之间的关系,导致特征表示不足,从而限制了模型的性能。在本研究中,我们构建了一个基于跨的NER模型,命名为MISS (Multiple Information Span Scoring)。该模型由两个主要模块组成:(1)用于类型无关实体提取的跨度提取器,将相对位置信息引入序列表示中;(2)将边界和内部信息融合到跨度表示中以增强跨度评分的跨度分类器。在跨度分类器中,我们还采用了卷积层进行跨跨度交互,对分类分数进行校正。实体预测是从两个模块计算的分数的总和中解码的。我们的方法简单有效。在没有任何外部资源的情况下,MISS在四个基准数据集上取得了相当大的改进。此外,烧蚀实验也证明了模型中各组成部分的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信