NUT@EMNLP最新文献

筛选
英文 中文
Combining Human and Machine Transcriptions on the Zooniverse Platform 在Zooniverse平台上结合人类和机器转录
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6129
Daniel Hanson, A. Simenstad
{"title":"Combining Human and Machine Transcriptions on the Zooniverse Platform","authors":"Daniel Hanson, A. Simenstad","doi":"10.18653/v1/W18-6129","DOIUrl":"https://doi.org/10.18653/v1/W18-6129","url":null,"abstract":"Transcribing handwritten documents to create fully searchable texts is an essential part of the archival process. Traditional text recognition methods, such as optical character recognition (OCR), do not work on handwritten documents due to their frequent noisiness and OCR’s need for individually segmented letters. Crowdsourcing and improved machine models are two modern methods for transcribing handwritten documents.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115375039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Case Study on Learning a Unified Encoder of Relations 关系统一编码器学习的实例研究
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6126
Lisheng Fu, Bonan Min, Thien Huu Nguyen, R. Grishman
{"title":"A Case Study on Learning a Unified Encoder of Relations","authors":"Lisheng Fu, Bonan Min, Thien Huu Nguyen, R. Grishman","doi":"10.18653/v1/W18-6126","DOIUrl":"https://doi.org/10.18653/v1/W18-6126","url":null,"abstract":"Typical relation extraction models are trained on a single corpus annotated with a pre-defined relation schema. An individual corpus is often small, and the models may often be biased or overfitted to the corpus. We hypothesize that we can learn a better representation by combining multiple relation datasets. We attempt to use a shared encoder to learn the unified feature representation and to augment it with regularization by adversarial training. The additional corpora feeding the encoder can help to learn a better feature representation layer even though the relation schemas are different. We use ACE05 and ERE datasets as our case study for experiments. The multi-task model obtains significant improvement on both datasets.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132020461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
FrameIt: Ontology Discovery for Noisy User-Generated Text 框架:嘈杂用户生成文本的本体发现
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6123
Dan Iter, A. Halevy, W. Tan
{"title":"FrameIt: Ontology Discovery for Noisy User-Generated Text","authors":"Dan Iter, A. Halevy, W. Tan","doi":"10.18653/v1/W18-6123","DOIUrl":"https://doi.org/10.18653/v1/W18-6123","url":null,"abstract":"A common need of NLP applications is to extract structured data from text corpora in order to perform analytics or trigger an appropriate action. The ontology defining the structure is typically application dependent and in many cases it is not known a priori. We describe the FrameIt System that provides a workflow for (1) quickly discovering an ontology to model a text corpus and (2) learning an SRL model that extracts the instances of the ontology from sentences in the corpus. FrameIt exploits data that is obtained in the ontology discovery phase as weak supervision data to bootstrap the SRL model and then enables the user to refine the model with active learning. We present empirical results and qualitative analysis of the performance of FrameIt on three corpora of noisy user-generated text.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132988727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Word-like character n-gram embedding 类词字符n-图嵌入
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6120
Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira
{"title":"Word-like character n-gram embedding","authors":"Geewook Kim, Kazuki Fukui, Hidetoshi Shimodaira","doi":"10.18653/v1/W18-6120","DOIUrl":"https://doi.org/10.18653/v1/W18-6120","url":null,"abstract":"We propose a new word embedding method called word-like character n-gram embedding, which learns distributed representations of words by embedding word-like character n-grams. Our method is an extension of recently proposed segmentation-free word embedding, which directly embeds frequent character n-grams from a raw corpus. However, its n-gram vocabulary tends to contain too many non-word n-grams. We solved this problem by introducing an idea of expected word frequency. Compared to the previously proposed methods, our method can embed more words, along with the words that are not included in a given basic word dictionary. Since our method does not rely on word segmentation with rich word dictionaries, it is especially effective when the text in the corpus is in unsegmented language and contains many neologisms and informal words (e.g., Chinese SNS dataset). Our experimental results on Sina Weibo (a Chinese microblog service) and Twitter show that the proposed method can embed more words and improve the performance of downstream tasks.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114277262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Low-resource named entity recognition via multi-source projection: Not quite there yet? 通过多源投影识别低资源命名实体:还没有完全实现?
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6125
Jan Vium Enghoff, S. Harrison, Zeljko Agic
{"title":"Low-resource named entity recognition via multi-source projection: Not quite there yet?","authors":"Jan Vium Enghoff, S. Harrison, Zeljko Agic","doi":"10.18653/v1/W18-6125","DOIUrl":"https://doi.org/10.18653/v1/W18-6125","url":null,"abstract":"Projecting linguistic annotations through word alignments is one of the most prevalent approaches to cross-lingual transfer learning. Conventional wisdom suggests that annotation projection “just works” regardless of the task at hand. We carefully consider multi-source projection for named entity recognition. Our experiment with 17 languages shows that to detect named entities in true low-resource languages, annotation projection may not be the right way to move forward. On a more positive note, we also uncover the conditions that do favor named entity projection from multiple sources. We argue these are infeasible under noisy low-resource constraints.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"08 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127232745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A POS Tagging Model Adapted to Learner English 一种适合英语学习者的词性标注模型
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6106
Ryo Nagata, Tomoya Mizumoto, Yuta Kikuchi, Yoshifumi Kawasaki, Kotaro Funakoshi
{"title":"A POS Tagging Model Adapted to Learner English","authors":"Ryo Nagata, Tomoya Mizumoto, Yuta Kikuchi, Yoshifumi Kawasaki, Kotaro Funakoshi","doi":"10.18653/v1/W18-6106","DOIUrl":"https://doi.org/10.18653/v1/W18-6106","url":null,"abstract":"There has been very limited work on the adaptation of Part-Of-Speech (POS) tagging to learner English despite the fact that POS tagging is widely used in related tasks. In this paper, we explore how we can adapt POS tagging to learner English efficiently and effectively. Based on the discussion of possible causes of POS tagging errors in learner English, we show that deep neural models are particularly suitable for this. Considering the previous findings and the discussion, we introduce the design of our model based on bidirectional Long Short-Term Memory. In addition, we describe how to adapt it to a wide variety of native languages (potentially, hundreds of them). In the evaluation section, we empirically show that it is effective for POS tagging in learner English, achieving an accuracy of 0.964, which significantly outperforms the state-of-the-art POS-tagger. We further investigate the tagging results in detail, revealing which part of the model design does or does not improve the performance.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134456922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using Wikipedia Edits in Low Resource Grammatical Error Correction 使用维基百科编辑在低资源语法错误纠正
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6111
Adriane Boyd
{"title":"Using Wikipedia Edits in Low Resource Grammatical Error Correction","authors":"Adriane Boyd","doi":"10.18653/v1/W18-6111","DOIUrl":"https://doi.org/10.18653/v1/W18-6111","url":null,"abstract":"We develop a grammatical error correction (GEC) system for German using a small gold GEC corpus augmented with edits extracted from Wikipedia revision history. We extend the automatic error annotation tool ERRANT (Bryant et al., 2017) for German and use it to analyze both gold GEC corrections and Wikipedia edits (Grundkiewicz and Junczys-Dowmunt, 2014) in order to select as additional training data Wikipedia edits containing grammatical corrections similar to those in the gold corpus. Using a multilayer convolutional encoder-decoder neural network GEC approach (Chollampatt and Ng, 2018), we evaluate the contribution of Wikipedia edits and find that carefully selected Wikipedia edits increase performance by over 5%.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125613149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata 堆栈溢出中的首选答案选择:更好的文本表示…元数据,元数据,元数据
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6119
Steven Xu, Andrew Bennett, D. Hoogeveen, Jey Han Lau, Timothy Baldwin
{"title":"Preferred Answer Selection in Stack Overflow: Better Text Representations ... and Metadata, Metadata, Metadata","authors":"Steven Xu, Andrew Bennett, D. Hoogeveen, Jey Han Lau, Timothy Baldwin","doi":"10.18653/v1/W18-6119","DOIUrl":"https://doi.org/10.18653/v1/W18-6119","url":null,"abstract":"Community question answering (cQA) forums provide a rich source of data for facilitating non-factoid question answering over many technical domains. Given this, there is considerable interest in answer retrieval from these kinds of forums. However this is a difficult task as the structure of these forums is very rich, and both metadata and text features are important for successful retrieval. While there has recently been a lot of work on solving this problem using deep learning models applied to question/answer text, this work has not looked at how to make use of the rich metadata available in cQA forums. We propose an attention-based model which achieves state-of-the-art results for text-based answer selection alone, and by making use of complementary meta-data, achieves a substantially higher result over two reference datasets novel to this work.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129976537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Inducing a lexicon of sociolinguistic variables from code-mixed text 从语码混合文本中归纳社会语言学变量词典
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6101
Philippa Shoemark, James P. Kirby, S. Goldwater
{"title":"Inducing a lexicon of sociolinguistic variables from code-mixed text","authors":"Philippa Shoemark, James P. Kirby, S. Goldwater","doi":"10.18653/v1/W18-6101","DOIUrl":"https://doi.org/10.18653/v1/W18-6101","url":null,"abstract":"Sociolinguistics is often concerned with how variants of a linguistic item (e.g., nothing vs. nothin’) are used by different groups or in different situations. We introduce the task of inducing lexical variables from code-mixed text: that is, identifying equivalence pairs such as (football, fitba) along with their linguistic code (football→British, fitba→Scottish). We adapt a framework for identifying gender-biased word pairs to this new task, and present results on three different pairs of English dialects, using tweets as the code-mixed text. Our system achieves precision of over 70% for two of these three datasets, and produces useful results even without extensive parameter tuning. Our success in adapting this framework from gender to language variety suggests that it could be used to discover other types of analogous pairs as well.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129178576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Convolutions Are All You Need (For Classifying Character Sequences) 卷积是你所需要的(用于分类字符序列)
NUT@EMNLP Pub Date : 2018-11-01 DOI: 10.18653/v1/W18-6127
Zach Wood-Doughty, Nicholas Andrews, Mark Dredze
{"title":"Convolutions Are All You Need (For Classifying Character Sequences)","authors":"Zach Wood-Doughty, Nicholas Andrews, Mark Dredze","doi":"10.18653/v1/W18-6127","DOIUrl":"https://doi.org/10.18653/v1/W18-6127","url":null,"abstract":"While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences. When text is modeled as characters instead of words, the longer sequences make RNNs a poor choice. Convolutional neural networks (CNNs), although somewhat less ubiquitous than RNNs, have an internal structure more appropriate for long-distance character dependencies. To better understand how CNNs and RNNs differ in handling long sequences, we use them for text classification tasks in several character-level social media datasets. The CNN models vastly outperform the RNN models in our experiments, suggesting that CNNs are superior to RNNs at learning to classify character-level data.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123088309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信