{"title":"Exploiting rich features for Chinese named entity recognition","authors":"Jianping Shen, Xuan Wang, S. Li, Lin Yao","doi":"10.1109/ISKE.2010.5680864","DOIUrl":null,"url":null,"abstract":"In this paper we design a multiple features template includes basic features, prefixes and suffixed features, dictionary features and combined features for Chinese named entity recognizer CRF model-based. We do a pre-processing procedure such as pos tag, chunk dictionary-based first. Then for dictionary features, different proportion of dictionaries are used in training and testing, which is different from the work reported in the literature, especially to person name dictionary, location name dictionary and organization name dictionary. For these three named entity dictionaries, the training dictionaries are just a part of the testing dictionaries. Empirical results show that the multiple features template is comprehensive and different proportion of some dictionaries used in training and testing improve performance significantly. Our final system achieved the F-measure of 91.27% at MSRA testing corpus, which is even better than the SIGHAN 2006 at the same testing corpus.","PeriodicalId":6417,"journal":{"name":"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering","volume":"33 1","pages":"278-282"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2010.5680864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we design a multiple features template includes basic features, prefixes and suffixed features, dictionary features and combined features for Chinese named entity recognizer CRF model-based. We do a pre-processing procedure such as pos tag, chunk dictionary-based first. Then for dictionary features, different proportion of dictionaries are used in training and testing, which is different from the work reported in the literature, especially to person name dictionary, location name dictionary and organization name dictionary. For these three named entity dictionaries, the training dictionaries are just a part of the testing dictionaries. Empirical results show that the multiple features template is comprehensive and different proportion of some dictionaries used in training and testing improve performance significantly. Our final system achieved the F-measure of 91.27% at MSRA testing corpus, which is even better than the SIGHAN 2006 at the same testing corpus.