VS@HLT-NAACL最新文献

筛选
英文 中文
Short Text Clustering via Convolutional Neural Networks 基于卷积神经网络的短文本聚类
VS@HLT-NAACL Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1509
Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao
{"title":"Short Text Clustering via Convolutional Neural Networks","authors":"Jiaming Xu, Peng Wang, Guanhua Tian, Bo Xu, Jun Zhao, Fangyuan Wang, Hongwei Hao","doi":"10.3115/v1/W15-1509","DOIUrl":"https://doi.org/10.3115/v1/W15-1509","url":null,"abstract":"Short text clustering has become an increasing important task with the popularity of social media, and it is a challenging problem due to its sparseness of text representation. In this paper, we propose a Short Text Clustering via Convolutional neural networks (abbr. to STCC), which is more beneficial for clustering by considering one constraint on learned features through a self-taught learning framework without using any external tags/labels. First, we embed the original keyword features into compact binary codes with a locality-preserving constraint. Then, word embed-dings are explored and fed into convolutional neural networks to learn deep feature representations, with the output units fitting the pre-trained binary code in the training process. After obtaining the learned representations, we use K-means to cluster them. Our extensive experimental study on two public short text datasets shows that the deep feature representation learned by our approach can achieve a significantly better performance than some other existing features, such as term frequency-inverse document frequency, Laplacian eigenvectors and average embedding, for clustering.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 148
Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing 序列标注的词嵌入与词类型:CV解析的奇特案例
VS@HLT-NAACL Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1517
Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru
{"title":"Word Embeddings vs Word Types for Sequence Labeling: the Curious Case of CV Parsing","authors":"Melanie Tosik, C. Hansen, Gerard Goossen, M. Rotaru","doi":"10.3115/v1/W15-1517","DOIUrl":"https://doi.org/10.3115/v1/W15-1517","url":null,"abstract":"We explore new methods of improving Curriculum Vitae (CV) parsing for German documents by applying recent research on the application of word embeddings in Natural Language Processing (NLP). Our approach integrates the word embeddings as input features for a probabilistic sequence labeling model that relies on the Conditional Random Field (CRF) framework. Best-performing word embeddings are generated from a large sample of German CVs. The best results on the extraction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features. The improvements are consistent throughout different sections of the target documents. The effect of the word embeddings is strongest on semi-structured, out-of-sample data.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"410 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126689825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study. 连续词表示的词法句法规律:多语言研究。
VS@HLT-NAACL Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1518
Garrett Nicolai, Colin Cherry, Grzegorz Kondrak
{"title":"Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study.","authors":"Garrett Nicolai, Colin Cherry, Grzegorz Kondrak","doi":"10.3115/v1/W15-1518","DOIUrl":"https://doi.org/10.3115/v1/W15-1518","url":null,"abstract":"We replicate the syntactic experiments of Mikolov et al. (2013b) on English, and expand them to include morphologically complex languages. We learn vector representations for Dutch, French, German, and Spanish with the WORD2VEC tool, and investigate to what extent inflectional information is preserved across vectors. We observe that the accuracy of vectors on a set of syntactic analogies is inversely correlated with the morphological complexity of the language.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133528913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Relation Extraction: Perspective from Convolutional Neural Networks 关系抽取:卷积神经网络的视角
VS@HLT-NAACL Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1506
Thien Huu Nguyen, R. Grishman
{"title":"Relation Extraction: Perspective from Convolutional Neural Networks","authors":"Thien Huu Nguyen, R. Grishman","doi":"10.3115/v1/W15-1506","DOIUrl":"https://doi.org/10.3115/v1/W15-1506","url":null,"abstract":"Up to now, relation extraction systems have made extensive use of features generated by linguistic analysis modules. Errors in these features lead to errors of relation detection and classification. In this work, we depart from these traditional approaches with complicated feature engineering by introducing a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources. Our model takes advantages of multiple window sizes for filters and pre-trained word embeddings as an initializer on a non-static architecture to improve the performance. We emphasize the relation extraction problem with an unbalanced corpus. The experimental results show that our system significantly outperforms not only the best baseline systems for relation extraction but also the state-of-the-art systems for relation classification.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"11 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124258157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 464
Semantic Information Extraction for Improved Word Embeddings 改进词嵌入的语义信息提取
VS@HLT-NAACL Pub Date : 2015-06-01 DOI: 10.3115/v1/W15-1523
Jiaqiang Chen, Gerard de Melo
{"title":"Semantic Information Extraction for Improved Word Embeddings","authors":"Jiaqiang Chen, Gerard de Melo","doi":"10.3115/v1/W15-1523","DOIUrl":"https://doi.org/10.3115/v1/W15-1523","url":null,"abstract":"Word embeddings have recently proven useful in a number of different applications that deal with natural language. Such embeddings succinctly reflect semantic similarities between words based on their sentence-internal contexts in large corpora. In this paper, we show that information extraction techniques provide valuable additional evidence of semantic relationships that can be exploited when producing word embeddings. We propose a joint model to train word embeddings both on regular context information and on more explicit semantic extractions. The word vectors obtained from such an augmented joint training show improved results on word similarity tasks, suggesting that they can be useful in applications that involve word meanings.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129471683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信