{"title":"中文命名实体识别中丢失词恢复的效用研究","authors":"Dunbo Cai, Zhiguo Huang, Ling Qian","doi":"10.1145/3460179.3460189","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) in natural language processing (NLP) considers the problem of identifying a sequence of words in a sentence text that mentions a predefined type of object (entity), e.g., person, organization, location, or time. NER methods are keys in extracting knowledge from texts as entities are fundamental for attaching entity properties or entity relations. However, NER for texts in Chinese is trickier due to that some auxiliary words maybe dropped in a sentence, which is a common phenomenon in Chinese writing for brevity. A usually dropped Chinese word is ‘的’ (often functions as the word ‘of’ in English). One obvious effect of this kind of omitting is bring difficulty in identifying the sub-entities (or nested named entities) contained in a named entity. Previous works considers the effected of recovering dropped pronouns in the Chinese translation task. Here we proposed a rule-based method to rover the auxiliary word ‘的’ for Chinese text, and study the effect of this recovery on the performance of a state-of-the-art Chinese NER method FLAT. Experimental results on Weibo-NER and MSRA-NER datasets shows that our method improves on FLAT. This study thus highlights the promising of recovering more types of dropped words for Chinese NER problem.","PeriodicalId":193744,"journal":{"name":"Proceedings of the 2021 6th International Conference on Intelligent Information Technology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On the Utility of Recovering Dropped Words in Chinese Named Entity Recognition\",\"authors\":\"Dunbo Cai, Zhiguo Huang, Ling Qian\",\"doi\":\"10.1145/3460179.3460189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named entity recognition (NER) in natural language processing (NLP) considers the problem of identifying a sequence of words in a sentence text that mentions a predefined type of object (entity), e.g., person, organization, location, or time. NER methods are keys in extracting knowledge from texts as entities are fundamental for attaching entity properties or entity relations. However, NER for texts in Chinese is trickier due to that some auxiliary words maybe dropped in a sentence, which is a common phenomenon in Chinese writing for brevity. A usually dropped Chinese word is ‘的’ (often functions as the word ‘of’ in English). One obvious effect of this kind of omitting is bring difficulty in identifying the sub-entities (or nested named entities) contained in a named entity. Previous works considers the effected of recovering dropped pronouns in the Chinese translation task. Here we proposed a rule-based method to rover the auxiliary word ‘的’ for Chinese text, and study the effect of this recovery on the performance of a state-of-the-art Chinese NER method FLAT. Experimental results on Weibo-NER and MSRA-NER datasets shows that our method improves on FLAT. This study thus highlights the promising of recovering more types of dropped words for Chinese NER problem.\",\"PeriodicalId\":193744,\"journal\":{\"name\":\"Proceedings of the 2021 6th International Conference on Intelligent Information Technology\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 6th International Conference on Intelligent Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460179.3460189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 6th International Conference on Intelligent Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460179.3460189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Utility of Recovering Dropped Words in Chinese Named Entity Recognition
Named entity recognition (NER) in natural language processing (NLP) considers the problem of identifying a sequence of words in a sentence text that mentions a predefined type of object (entity), e.g., person, organization, location, or time. NER methods are keys in extracting knowledge from texts as entities are fundamental for attaching entity properties or entity relations. However, NER for texts in Chinese is trickier due to that some auxiliary words maybe dropped in a sentence, which is a common phenomenon in Chinese writing for brevity. A usually dropped Chinese word is ‘的’ (often functions as the word ‘of’ in English). One obvious effect of this kind of omitting is bring difficulty in identifying the sub-entities (or nested named entities) contained in a named entity. Previous works considers the effected of recovering dropped pronouns in the Chinese translation task. Here we proposed a rule-based method to rover the auxiliary word ‘的’ for Chinese text, and study the effect of this recovery on the performance of a state-of-the-art Chinese NER method FLAT. Experimental results on Weibo-NER and MSRA-NER datasets shows that our method improves on FLAT. This study thus highlights the promising of recovering more types of dropped words for Chinese NER problem.