NUT@EMNLP最新文献

筛选
英文 中文
Boundary-based MWE segmentation with text partitioning 基于边界的文本分割
NUT@EMNLP Pub Date : 2016-08-05 DOI: 10.18653/v1/W17-4401
J. Williams
{"title":"Boundary-based MWE segmentation with text partitioning","authors":"J. Williams","doi":"10.18653/v1/W17-4401","DOIUrl":"https://doi.org/10.18653/v1/W17-4401","url":null,"abstract":"This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121647368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An Entity Resolution Approach to Isolate Instances of Human Trafficking Online 在线隔离人口贩运实例的实体解决方法
NUT@EMNLP Pub Date : 2015-09-22 DOI: 10.18653/v1/W17-4411
Chirag Nagpal, K. Miller, Benedikt Boecking, A. Dubrawski
{"title":"An Entity Resolution Approach to Isolate Instances of Human Trafficking Online","authors":"Chirag Nagpal, K. Miller, Benedikt Boecking, A. Dubrawski","doi":"10.18653/v1/W17-4411","DOIUrl":"https://doi.org/10.18653/v1/W17-4411","url":null,"abstract":"Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as ‘escort advertisements’ on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129178580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Classification of Tweets about Reported Events using Neural Networks 使用神经网络对报道事件的推文进行分类
NUT@EMNLP Pub Date : 1900-01-01 DOI: 10.18653/v1/W18-6121
Kiminobu Makino, Yuka Takei, Taro Miyazaki, Jun Goto
{"title":"Classification of Tweets about Reported Events using Neural Networks","authors":"Kiminobu Makino, Yuka Takei, Taro Miyazaki, Jun Goto","doi":"10.18653/v1/W18-6121","DOIUrl":"https://doi.org/10.18653/v1/W18-6121","url":null,"abstract":"We developed a system that automatically extracts “Event-describing Tweets” which include incidents or accidents information for creating news reports. Event-describing Tweets can be classified into “Reported-event Tweets” and “New-information Tweets.” Reported-event Tweets cite news agencies or user generated content sites, and New-information Tweets are other Event-describing Tweets. A system is needed to classify them so that creators of factual TV programs can use them in their productions. Proposing this Tweet classification task is one of the contributions of this paper, because no prior papers have used the same task even though program creators and other events information collectors have to do it to extract required information from social networking sites. To classify Tweets in this task, this paper proposes a method to input and concatenate character and word sequences in Japanese Tweets by using convolutional neural networks. This proposed method is another contribution of this paper. For comparison, character or word input methods and other neural networks are also used. Results show that a system using the proposed method and architectures can classify Tweets with an F1 score of 88 %.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121259072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Twitter Geolocation using Knowledge-Based Methods 使用基于知识的方法进行Twitter地理定位
NUT@EMNLP Pub Date : 1900-01-01 DOI: 10.18653/v1/W18-6102
Taro Miyazaki, Afshin Rahimi, Trevor Cohn, Timothy Baldwin
{"title":"Twitter Geolocation using Knowledge-Based Methods","authors":"Taro Miyazaki, Afshin Rahimi, Trevor Cohn, Timothy Baldwin","doi":"10.18653/v1/W18-6102","DOIUrl":"https://doi.org/10.18653/v1/W18-6102","url":null,"abstract":"Automatic geolocation of microblog posts from their text content is particularly difficult because many location-indicative terms are rare terms, notably entity names such as locations, people or local organisations. Their low frequency means that key terms observed in testing are often unseen in training, such that standard classifiers are unable to learn weights for them. We propose a method for reasoning over such terms using a knowledge base, through exploiting their relations with other entities. Our technique uses a graph embedding over the knowledge base, which we couple with a text representation to learn a geolocation classifier, trained end-to-end. We show that our method improves over purely text-based methods, which we ascribe to more robust treatment of low-count and out-of-vocabulary entities.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123268797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Step or Not: Discriminator for The Real Instructions in User-generated Recipes 步骤与否:用户生成食谱中真实指令的鉴别器
NUT@EMNLP Pub Date : 1900-01-01 DOI: 10.18653/v1/W18-6128
Shintaro Inuzuka, Takahiko Ito, Jun Harashima
{"title":"Step or Not: Discriminator for The Real Instructions in User-generated Recipes","authors":"Shintaro Inuzuka, Takahiko Ito, Jun Harashima","doi":"10.18653/v1/W18-6128","DOIUrl":"https://doi.org/10.18653/v1/W18-6128","url":null,"abstract":"In a recipe sharing service, users publish recipe instructions in the form of a series of steps. However, some of the “steps” are not actually part of the cooking process. Specifically, advertisements of recipes themselves (e.g., “introduced on TV”) and comments (e.g., “Thanks for many messages”) may often be included in the step section of the recipe, like the recipe author’s communication tool. However, such fake steps can cause problems when using recipe search indexing or when being spoken by devices such as smart speakers. As presented in this talk, we have constructed a discriminator that distinguishes between such a fake step and the step actually used for cooking. This project includes, but is not limited to, the creation of annotation data by classifying and analyzing recipe steps and the construction of identification models. Our models use only text information to identify the step. In our test, machine learning models achieved higher accuracy than rule-based methods that use manually chosen clue words.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128850548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信