基于特征的集成方法识别新出现的和罕见的命名实体

NUT@EMNLP Pub Date : 2017-09-01 DOI:10.18653/v1/W17-4424

Utpal Kumar Sikdar, Björn Gambäck

{"title":"基于特征的集成方法识别新出现的和罕见的命名实体","authors":"Utpal Kumar Sikdar, Björn Gambäck","doi":"10.18653/v1/W17-4424","DOIUrl":null,"url":null,"abstract":"Detecting previously unseen named entities in text is a challenging task. The paper describes how three initial classifier models were built using Conditional Random Fields (CRFs), Support Vector Machines (SVMs) and a Long Short-Term Memory (LSTM) recurrent neural network. The outputs of these three classifiers were then used as features to train another CRF classifier working as an ensemble. 5-fold cross-validation based on training and development data for the emerging and rare named entity recognition shared task showed precision, recall and F1-score of 66.87%, 46.75% and 54.97%, respectively. For surface form evaluation, the CRF ensemble-based system achieved precision, recall and F1 scores of 65.18%, 45.20% and 53.30%. When applied to unseen test data, the model reached 47.92% precision, 31.97% recall and 38.55% F1-score for entity level evaluation, with the corresponding surface form evaluation values of 44.91%, 30.47% and 36.31%.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities\",\"authors\":\"Utpal Kumar Sikdar, Björn Gambäck\",\"doi\":\"10.18653/v1/W17-4424\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting previously unseen named entities in text is a challenging task. The paper describes how three initial classifier models were built using Conditional Random Fields (CRFs), Support Vector Machines (SVMs) and a Long Short-Term Memory (LSTM) recurrent neural network. The outputs of these three classifiers were then used as features to train another CRF classifier working as an ensemble. 5-fold cross-validation based on training and development data for the emerging and rare named entity recognition shared task showed precision, recall and F1-score of 66.87%, 46.75% and 54.97%, respectively. For surface form evaluation, the CRF ensemble-based system achieved precision, recall and F1 scores of 65.18%, 45.20% and 53.30%. When applied to unseen test data, the model reached 47.92% precision, 31.97% recall and 38.55% F1-score for entity level evaluation, with the corresponding surface form evaluation values of 44.91%, 30.47% and 36.31%.\",\"PeriodicalId\":207795,\"journal\":{\"name\":\"NUT@EMNLP\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NUT@EMNLP\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W17-4424\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NUT@EMNLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W17-4424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

检测文本中以前未见过的命名实体是一项具有挑战性的任务。本文描述了如何使用条件随机场(CRFs)、支持向量机(svm)和长短期记忆(LSTM)递归神经网络构建三种初始分类器模型。然后使用这三个分类器的输出作为特征来训练另一个作为集成工作的CRF分类器。基于训练和发展数据的5倍交叉验证表明，新出现和罕见的命名实体识别共享任务的准确率、召回率和f1得分分别为66.87%、46.75%和54.97%。对于表面形态评价，基于CRF集合的系统的准确率、召回率和F1得分分别为65.18%、45.20%和53.30%。当应用于未见的测试数据时，模型的实体层次评价精度达到47.92%，召回率达到31.97%，f1得分达到38.55%，对应的表面形式评价值为44.91%，30.47%和36.31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities

Detecting previously unseen named entities in text is a challenging task. The paper describes how three initial classifier models were built using Conditional Random Fields (CRFs), Support Vector Machines (SVMs) and a Long Short-Term Memory (LSTM) recurrent neural network. The outputs of these three classifiers were then used as features to train another CRF classifier working as an ensemble. 5-fold cross-validation based on training and development data for the emerging and rare named entity recognition shared task showed precision, recall and F1-score of 66.87%, 46.75% and 54.97%, respectively. For surface form evaluation, the CRF ensemble-based system achieved precision, recall and F1 scores of 65.18%, 45.20% and 53.30%. When applied to unseen test data, the model reached 47.92% precision, 31.97% recall and 38.55% F1-score for entity level evaluation, with the corresponding surface form evaluation values of 44.91%, 30.47% and 36.31%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

NUT@EMNLP

自引率

0.00%

发文量