Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li
{"title":"基于Spy-PU算法的远程监督命名实体识别","authors":"Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li","doi":"10.1109/PRML52754.2021.9520707","DOIUrl":null,"url":null,"abstract":"Named entity recognition is the basis of natural language processing tasks. In the field of Chinese named entity recognition, tag data sparseness is the core reason that limits the performance of named entity recognition models. To solve the problem, we propose a general approach, which can improve the effect of Chinese named entity recognition with a little samples. A key feature of the proposed method is that it can automatically label the unlabeled text through distant supervision hypothesis and use the Spy-PU algorithm to reduce the negative impact of unlabeled entity problem. Experimental results show that the method has better performance on four types of public data sets: MSRA, OntoNotes4.0, Resume and Weibo, and can effectively alleviate the impact of label data sparseness.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Distantly Supervised Named Entity Recognition with Spy-PU Algorithm\",\"authors\":\"Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li\",\"doi\":\"10.1109/PRML52754.2021.9520707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named entity recognition is the basis of natural language processing tasks. In the field of Chinese named entity recognition, tag data sparseness is the core reason that limits the performance of named entity recognition models. To solve the problem, we propose a general approach, which can improve the effect of Chinese named entity recognition with a little samples. A key feature of the proposed method is that it can automatically label the unlabeled text through distant supervision hypothesis and use the Spy-PU algorithm to reduce the negative impact of unlabeled entity problem. Experimental results show that the method has better performance on four types of public data sets: MSRA, OntoNotes4.0, Resume and Weibo, and can effectively alleviate the impact of label data sparseness.\",\"PeriodicalId\":429603,\"journal\":{\"name\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRML52754.2021.9520707\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distantly Supervised Named Entity Recognition with Spy-PU Algorithm
Named entity recognition is the basis of natural language processing tasks. In the field of Chinese named entity recognition, tag data sparseness is the core reason that limits the performance of named entity recognition models. To solve the problem, we propose a general approach, which can improve the effect of Chinese named entity recognition with a little samples. A key feature of the proposed method is that it can automatically label the unlabeled text through distant supervision hypothesis and use the Spy-PU algorithm to reduce the negative impact of unlabeled entity problem. Experimental results show that the method has better performance on four types of public data sets: MSRA, OntoNotes4.0, Resume and Weibo, and can effectively alleviate the impact of label data sparseness.