Mengna Nie, Lianglun Cheng, Haiming Ye, Weiwen Zhang
{"title":"Chinese NER with High-Level Features in Specific Domain","authors":"Mengna Nie, Lianglun Cheng, Haiming Ye, Weiwen Zhang","doi":"10.1145/3529836.3529937","DOIUrl":null,"url":null,"abstract":"In recent years, the character-word lattice structure has achieved good performance in Chinese named entity recognition (NER). However, in some specific domain, like marine industry domain, there are many specialized words that are hard to be segmented to utilize. Facing this challenge, it is necessary to employ a method to better identify the domain-specific entities with advanced features. In this paper, we develop a new method based on multivariate data embedding which further extracts higher-level character features in the embedding layer. Specifically, we extract higher-level character features by CNN and integrate them with the lattice representation to obtain enhanced embedding representation. Our model exploits the character information that can better capture the morphological and semantic information of characters to provide information support for NER. Experimental results on our marine industry dataset demonstrate the superiority of our approach. Specially, it outperforms the most previous model. And the ablation study validates the effect of the advanced features extraction.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the character-word lattice structure has achieved good performance in Chinese named entity recognition (NER). However, in some specific domain, like marine industry domain, there are many specialized words that are hard to be segmented to utilize. Facing this challenge, it is necessary to employ a method to better identify the domain-specific entities with advanced features. In this paper, we develop a new method based on multivariate data embedding which further extracts higher-level character features in the embedding layer. Specifically, we extract higher-level character features by CNN and integrate them with the lattice representation to obtain enhanced embedding representation. Our model exploits the character information that can better capture the morphological and semantic information of characters to provide information support for NER. Experimental results on our marine industry dataset demonstrate the superiority of our approach. Specially, it outperforms the most previous model. And the ablation study validates the effect of the advanced features extraction.