{"title":"MISS: Multiple information span scoring for Chinese named entity recognition","authors":"Liyi Yang , Shuli Xing , Guojun Mao","doi":"10.1016/j.csl.2025.101783","DOIUrl":null,"url":null,"abstract":"<div><div>Named entity recognition (NER) has drawn much attention from researchers. In Chinese text, characters carry rich contextual and regularity-based information. In most previous works on Chinese NER, a model excavates boundary features of phrase spans, yet the token information within spans and relationship between adjacent spans are neglected, which leads to insufficient feature representations and thereby limits model performance. In this study, we construct a span-based NER model named MISS (Multiple Information Span Scoring). The model consists of two major modules: (1) a span extractor for type-independent entity extraction, where the relative position information is introduced into sequence representations; and (2) a span classifier that fuses boundary and internal information into span representations for enhanced span scoring. In the span classifier, we also employ a convolutional layer to conduct cross-span interaction, which rectifies the classification scores. Entity predictions are decoded from the sum of scores computed by two modules. Our method is simple and effective. Without any external resources, MISS achieves considerable improvement on four benchmark datasets. Moreover, the ablation experiments have demonstrated the effectiveness of each component in our model.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"92 ","pages":"Article 101783"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000087","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Named entity recognition (NER) has drawn much attention from researchers. In Chinese text, characters carry rich contextual and regularity-based information. In most previous works on Chinese NER, a model excavates boundary features of phrase spans, yet the token information within spans and relationship between adjacent spans are neglected, which leads to insufficient feature representations and thereby limits model performance. In this study, we construct a span-based NER model named MISS (Multiple Information Span Scoring). The model consists of two major modules: (1) a span extractor for type-independent entity extraction, where the relative position information is introduced into sequence representations; and (2) a span classifier that fuses boundary and internal information into span representations for enhanced span scoring. In the span classifier, we also employ a convolutional layer to conduct cross-span interaction, which rectifies the classification scores. Entity predictions are decoded from the sum of scores computed by two modules. Our method is simple and effective. Without any external resources, MISS achieves considerable improvement on four benchmark datasets. Moreover, the ablation experiments have demonstrated the effectiveness of each component in our model.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.