Atefeh Zafarian, Ali Rokni, Shahram Khadivi, Sonia Ghiasifard
{"title":"基于弱标记训练数据的半监督学习命名实体识别","authors":"Atefeh Zafarian, Ali Rokni, Shahram Khadivi, Sonia Ghiasifard","doi":"10.1109/AISP.2015.7123504","DOIUrl":null,"url":null,"abstract":"The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.","PeriodicalId":405857,"journal":{"name":"2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Semi-supervised learning for named entity recognition using weakly labeled training data\",\"authors\":\"Atefeh Zafarian, Ali Rokni, Shahram Khadivi, Sonia Ghiasifard\",\"doi\":\"10.1109/AISP.2015.7123504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.\",\"PeriodicalId\":405857,\"journal\":{\"name\":\"2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AISP.2015.7123504\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AISP.2015.7123504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-supervised learning for named entity recognition using weakly labeled training data
The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.