Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li
{"title":"Distantly Supervised Named Entity Recognition with Spy-PU Algorithm","authors":"Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li","doi":"10.1109/PRML52754.2021.9520707","DOIUrl":null,"url":null,"abstract":"Named entity recognition is the basis of natural language processing tasks. In the field of Chinese named entity recognition, tag data sparseness is the core reason that limits the performance of named entity recognition models. To solve the problem, we propose a general approach, which can improve the effect of Chinese named entity recognition with a little samples. A key feature of the proposed method is that it can automatically label the unlabeled text through distant supervision hypothesis and use the Spy-PU algorithm to reduce the negative impact of unlabeled entity problem. Experimental results show that the method has better performance on four types of public data sets: MSRA, OntoNotes4.0, Resume and Weibo, and can effectively alleviate the impact of label data sparseness.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Named entity recognition is the basis of natural language processing tasks. In the field of Chinese named entity recognition, tag data sparseness is the core reason that limits the performance of named entity recognition models. To solve the problem, we propose a general approach, which can improve the effect of Chinese named entity recognition with a little samples. A key feature of the proposed method is that it can automatically label the unlabeled text through distant supervision hypothesis and use the Spy-PU algorithm to reduce the negative impact of unlabeled entity problem. Experimental results show that the method has better performance on four types of public data sets: MSRA, OntoNotes4.0, Resume and Weibo, and can effectively alleviate the impact of label data sparseness.