社交媒体数据中命名实体识别的多任务方法

NUT@EMNLP Pub Date : 2017-09-01 DOI:10.18653/v1/W17-4419
Gustavo Aguilar, Suraj Maharjan, Adrian Pastor Lopez-Monroy, T. Solorio
{"title":"社交媒体数据中命名实体识别的多任务方法","authors":"Gustavo Aguilar, Suraj Maharjan, Adrian Pastor Lopez-Monroy, T. Solorio","doi":"10.18653/v1/W17-4419","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition for social media data is challenging because of its inherent noisiness. In addition to improper grammatical structures, it contains spelling inconsistencies and numerous informal abbreviations. We propose a novel multi-task approach by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization. The multi-task neural network architecture learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information. This neural network acts as a feature extractor to feed a Conditional Random Fields classifier. We were able to obtain the first position in the 3rd Workshop on Noisy User-generated Text (WNUT-2017) with a 41.86% entity F1-score and a 40.24% surface F1-score.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"130","resultStr":"{\"title\":\"A Multi-task Approach for Named Entity Recognition in Social Media Data\",\"authors\":\"Gustavo Aguilar, Suraj Maharjan, Adrian Pastor Lopez-Monroy, T. Solorio\",\"doi\":\"10.18653/v1/W17-4419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named Entity Recognition for social media data is challenging because of its inherent noisiness. In addition to improper grammatical structures, it contains spelling inconsistencies and numerous informal abbreviations. We propose a novel multi-task approach by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization. The multi-task neural network architecture learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information. This neural network acts as a feature extractor to feed a Conditional Random Fields classifier. We were able to obtain the first position in the 3rd Workshop on Noisy User-generated Text (WNUT-2017) with a 41.86% entity F1-score and a 40.24% surface F1-score.\",\"PeriodicalId\":207795,\"journal\":{\"name\":\"NUT@EMNLP\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"130\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NUT@EMNLP\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W17-4419\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NUT@EMNLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W17-4419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 130

摘要

社交媒体数据的命名实体识别由于其固有的噪声而具有挑战性。除了语法结构不正确外,它还包含拼写不一致和许多非正式缩写。我们提出了一种新的多任务方法,通过使用更通用的命名实体(网元)分割次要任务和细粒度网元分类的主要任务。多任务神经网络架构从单词和字符序列以及基本词性标签和地名信息中学习高阶特征表示。该神经网络作为一个特征提取器来馈送一个条件随机场分类器。我们以41.86%的实体f1得分和40.24%的表面f1得分获得了第三届嘈杂用户生成文本研讨会(WNUT-2017)的第一名。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Multi-task Approach for Named Entity Recognition in Social Media Data
Named Entity Recognition for social media data is challenging because of its inherent noisiness. In addition to improper grammatical structures, it contains spelling inconsistencies and numerous informal abbreviations. We propose a novel multi-task approach by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization. The multi-task neural network architecture learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information. This neural network acts as a feature extractor to feed a Conditional Random Fields classifier. We were able to obtain the first position in the 3rd Workshop on Noisy User-generated Text (WNUT-2017) with a 41.86% entity F1-score and a 40.24% surface F1-score.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信