{"title":"Unified Neural Lexical Analysis Via Two-Stage Span Tagging","authors":"Yantuan Xian, Yefen Zhu, Zhentao Yu, Yuxin Huang, Junjun Guo, Yan Xiang","doi":"10.1049/cit2.70015","DOIUrl":null,"url":null,"abstract":"<p>Lexical analysis is a fundamental task in natural language processing, which involves several subtasks, such as word segmentation (WS), part-of-speech (POS) tagging, and named entity recognition (NER). Recent works have shown that taking advantage of relatedness between these subtasks can be beneficial. This paper proposes a unified neural framework to address these subtasks simultaneously. Apart from the sequence tagging paradigm, the proposed method tackles the multitask lexical analysis via two-stage sequence span classification. Firstly, the model detects the word and named entity boundaries by multi-label classification over character spans in a sentence. Then, the authors assign POS labels and entity labels for words and named entities by multi-class classification, respectively. Furthermore, a Gated Task Transformation (GTT) is proposed to encourage the model to share valuable features between tasks. The performance of the proposed model was evaluated on Chinese and Thai public datasets, demonstrating state-of-the-art results.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1254-1267"},"PeriodicalIF":7.3000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70015","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70015","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Lexical analysis is a fundamental task in natural language processing, which involves several subtasks, such as word segmentation (WS), part-of-speech (POS) tagging, and named entity recognition (NER). Recent works have shown that taking advantage of relatedness between these subtasks can be beneficial. This paper proposes a unified neural framework to address these subtasks simultaneously. Apart from the sequence tagging paradigm, the proposed method tackles the multitask lexical analysis via two-stage sequence span classification. Firstly, the model detects the word and named entity boundaries by multi-label classification over character spans in a sentence. Then, the authors assign POS labels and entity labels for words and named entities by multi-class classification, respectively. Furthermore, a Gated Task Transformation (GTT) is proposed to encourage the model to share valuable features between tasks. The performance of the proposed model was evaluated on Chinese and Thai public datasets, demonstrating state-of-the-art results.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.