NUT@EMNLP最新文献

筛选
英文 中文
Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts 印尼语会话文本中基于字符的神经命名实体识别模型的实证评价
NUT@EMNLP Pub Date : 2018-05-01 DOI: 10.18653/v1/W18-6112
Kemal Kurniawan, Samuel Louvan
{"title":"Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts","authors":"Kemal Kurniawan, Samuel Louvan","doi":"10.18653/v1/W18-6112","DOIUrl":"https://doi.org/10.18653/v1/W18-6112","url":null,"abstract":"Despite the long history of named-entity recognition (NER) task in the natural language processing community, previous work rarely studied the task on conversational texts. Such texts are challenging because they contain a lot of word variations which increase the number of out-of-vocabulary (OOV) words. The high number of OOV words poses a difficulty for word-based neural models. Meanwhile, there is plenty of evidence to the effectiveness of character-based neural models in mitigating this OOV problem. We report an empirical evaluation of neural sequence labeling models with character embedding to tackle NER task in Indonesian conversational texts. Our experiments show that (1) character models outperform word embedding-only models by up to 4 F1 points, (2) character models perform better in OOV cases with an improvement of as high as 15 F1 points, and (3) character models are robust against a very high OOV rate.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"286 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114953197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Simple Queries as Distant Labels for Predicting Gender on Twitter 简单查询作为Twitter上预测性别的距离标签
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4407
Chris Emmery, Grzegorz Chrupała, Walter Daelemans
{"title":"Simple Queries as Distant Labels for Predicting Gender on Twitter","authors":"Chris Emmery, Grzegorz Chrupała, Walter Daelemans","doi":"10.18653/v1/W17-4407","DOIUrl":"https://doi.org/10.18653/v1/W17-4407","url":null,"abstract":"The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly supervised methods exist, although these generally rely on knowledge gathered using external sources. This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries. We confirm the reliability of this query heuristic by comparing with manual annotation. Moreover, using these labels for distant supervision, we demonstrate competitive model performance on the same data as models trained on manual annotations. As such, we offer a cheap, extensible, and fast alternative that can be employed beyond the task of gender classification.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117210764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Constructing an Alias List for Named Entities during an Event 在事件期间为命名实体构造别名列表
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4405
Anietie U Andy, Mark Dredze, M. Rwebangira, Chris Callison-Burch
{"title":"Constructing an Alias List for Named Entities during an Event","authors":"Anietie U Andy, Mark Dredze, M. Rwebangira, Chris Callison-Burch","doi":"10.18653/v1/W17-4405","DOIUrl":"https://doi.org/10.18653/v1/W17-4405","url":null,"abstract":"In certain fields, real-time knowledge from events can help in making informed decisions. In order to extract pertinent real-time knowledge related to an event, it is important to identify the named entities and their corresponding aliases related to the event. The problem of identifying aliases of named entities that spike has remained unexplored. In this paper, we introduce an algorithm, EntitySpike, that identifies entities that spike in popularity in tweets from a given time period, and constructs an alias list for these spiked entities. EntitySpike uses a temporal heuristic to identify named entities with similar context that occur in the same time period (within minutes) during an event. Each entity is encoded as a vector using this temporal heuristic. We show how these entity-vectors can be used to create a named entity alias list. We evaluated our algorithm on a dataset of temporally ordered tweets from a single event, the 2013 Grammy Awards show. We carried out various experiments on tweets that were published in the same time period and show that our algorithm identifies most entity name aliases and outperforms a competitive baseline.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125043364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Multi-task Approach for Named Entity Recognition in Social Media Data 社交媒体数据中命名实体识别的多任务方法
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4419
Gustavo Aguilar, Suraj Maharjan, Adrian Pastor Lopez-Monroy, T. Solorio
{"title":"A Multi-task Approach for Named Entity Recognition in Social Media Data","authors":"Gustavo Aguilar, Suraj Maharjan, Adrian Pastor Lopez-Monroy, T. Solorio","doi":"10.18653/v1/W17-4419","DOIUrl":"https://doi.org/10.18653/v1/W17-4419","url":null,"abstract":"Named Entity Recognition for social media data is challenging because of its inherent noisiness. In addition to improper grammatical structures, it contains spelling inconsistencies and numerous informal abbreviations. We propose a novel multi-task approach by employing a more general secondary task of Named Entity (NE) segmentation together with the primary task of fine-grained NE categorization. The multi-task neural network architecture learns higher order feature representations from word and character sequences along with basic Part-of-Speech tags and gazetteer information. This neural network acts as a feature extractor to feed a Conditional Random Fields classifier. We were able to obtain the first position in the 3rd Workshop on Noisy User-generated Text (WNUT-2017) with a 41.86% entity F1-score and a 40.24% surface F1-score.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125999368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
Evaluating hypotheses in geolocation on a very large sample of Twitter 在一个非常大的Twitter样本上评估地理定位的假设
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4409
Bahar Salehi, Anders Søgaard
{"title":"Evaluating hypotheses in geolocation on a very large sample of Twitter","authors":"Bahar Salehi, Anders Søgaard","doi":"10.18653/v1/W17-4409","DOIUrl":"https://doi.org/10.18653/v1/W17-4409","url":null,"abstract":"Recent work in geolocation has made several hypotheses about what linguistic markers are relevant to detect where people write from. In this paper, we examine six hypotheses against a corpus consisting of all geo-tagged tweets from the US, or whose geo-tags could be inferred, in a 19% sample of Twitter history. Our experiments lend support to all six hypotheses, including that spelling variants and hashtags are strong predictors of location. We also study what kinds of common nouns are predictive of location after controlling for named entities such as dolphins or sharks","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"2128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129974432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Huntsville, hospitals, and hockey teams: Names can reveal your location 亨茨维尔、医院和冰球队:名字可以透露你的位置
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4415
Bahar Salehi, Dirk Hovy, E. Hovy, Anders Søgaard
{"title":"Huntsville, hospitals, and hockey teams: Names can reveal your location","authors":"Bahar Salehi, Dirk Hovy, E. Hovy, Anders Søgaard","doi":"10.18653/v1/W17-4415","DOIUrl":"https://doi.org/10.18653/v1/W17-4415","url":null,"abstract":"Geolocation is the task of identifying a social media user’s primary location, and in natural language processing, there is a growing literature on to what extent automated analysis of social media posts can help. However, not all content features are equally revealing of a user’s location. In this paper, we evaluate nine name entity (NE) types. Using various metrics, we find that GEO-LOC, FACILITY and SPORT-TEAM are more informative for geolocation than other NE types. Using these types, we improve geolocation accuracy and reduce distance error over various famous text-based methods.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130070053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media 社交媒体新兴命名实体识别的多通道BiLSTM-CRF模型
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4421
Bill Yuchen Lin, Frank F. Xu, Zhiyi Luo, Kenny Q. Zhu
{"title":"Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media","authors":"Bill Yuchen Lin, Frank F. Xu, Zhiyi Luo, Kenny Q. Zhu","doi":"10.18653/v1/W17-4421","DOIUrl":"https://doi.org/10.18653/v1/W17-4421","url":null,"abstract":"In this paper, we present our multi-channel neural architecture for recognizing emerging named entity in social media messages, which we applied in the Novel and Emerging Named Entity Recognition shared task at the EMNLP 2017 Workshop on Noisy User-generated Text (W-NUT). We propose a novel approach, which incorporates comprehensive word representations with multi-channel information and Conditional Random Fields (CRF) into a traditional Bidirectional Long Short-Term Memory (BiLSTM) neural network without using any additional hand-craft features such as gazetteers. In comparison with other systems participating in the shared task, our system won the 2nd place.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130408228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Context-Sensitive Recognition for Emerging and Rare Entities 新兴和稀有实体的上下文敏感识别
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4423
J. Williams, Giovanni C. Santia
{"title":"Context-Sensitive Recognition for Emerging and Rare Entities","authors":"J. Williams, Giovanni C. Santia","doi":"10.18653/v1/W17-4423","DOIUrl":"https://doi.org/10.18653/v1/W17-4423","url":null,"abstract":"This paper is a shared task system description for the 2017 W-NUT shared task on Rare and Emerging Named Entities. Our paper describes the development and application of a novel algorithm for named entity recognition that relies only on the contexts of word forms. A comparison against the other submitted systems is provided.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121040207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition WNUT2017新实体和新兴实体识别共享任务结果
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4418
Leon Derczynski, Eric Nichols, M. Erp, Nut Limsopatham
{"title":"Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition","authors":"Leon Derczynski, Eric Nichols, M. Erp, Nut Limsopatham","doi":"10.18653/v1/W17-4418","DOIUrl":"https://doi.org/10.18653/v1/W17-4418","url":null,"abstract":"This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?!” – even human experts find the entity ‘kktny’ hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122769511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 294
A Text Normalisation System for Non-Standard English Words 非标准英语单词文本规范化系统
NUT@EMNLP Pub Date : 2017-09-01 DOI: 10.18653/v1/W17-4414
E. Flint, Elliot Ford, Olivia Thomas, Andrew Caines, P. Buttery
{"title":"A Text Normalisation System for Non-Standard English Words","authors":"E. Flint, Elliot Ford, Olivia Thomas, Andrew Caines, P. Buttery","doi":"10.18653/v1/W17-4414","DOIUrl":"https://doi.org/10.18653/v1/W17-4414","url":null,"abstract":"This paper investigates the problem of text normalisation; specifically, the normalisation of non-standard words (NSWs) in English. Non-standard words can be defined as those word tokens which do not have a dictionary entry, and cannot be pronounced using the usual letter-to-phoneme conversion rules; e.g. lbs, 99.3%, #EMNLP2017. NSWs pose a challenge to the proper functioning of text-to-speech technology, and the solution is to spell them out in such a way that they can be pronounced appropriately. We describe our four-stage normalisation system made up of components for detection, classification, division and expansion of NSWs. Performance is favourabe compared to previous work in the field (Sproat et al. 2001, Normalization of non-standard words), as well as state-of-the-art text-to-speech software. Further, we update Sproat et al.’s NSW taxonomy, and create a more customisable system where users are able to input their own abbreviations and specify into which variety of English (currently available: British or American) they wish to normalise.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124141243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信