Learning Distributed Representations for Multilingual Text Sequences

VS@HLT-NAACL Pub Date : 2015-06-01 DOI:10.3115/v1/W15-1512
Hieu Pham, Thang Luong, Christopher D. Manning
{"title":"Learning Distributed Representations for Multilingual Text Sequences","authors":"Hieu Pham, Thang Luong, Christopher D. Manning","doi":"10.3115/v1/W15-1512","DOIUrl":null,"url":null,"abstract":"We propose a novel approach to learning distributed representations of variable-length text sequences in multiple languages simultaneously. Unlike previous work which often derive representations of multi-word sequences as weighted sums of individual word vectors, our model learns distributed representations for phrases and sentences as a whole. Our work is similar in spirit to the recent paragraph vector approach but extends to the bilingual context so as to efficiently encode meaning-equivalent text sequences of multiple languages in the same semantic space. Our learned embeddings achieve state-of-theart performance in the often used crosslingual document classification task (CLDC) with an accuracy of 92.7 for English to German and 91.5 for German to English. By learning text sequence representations as a whole, our model performs equally well in both classification directions in the CLDC task in which past work did not achieve.","PeriodicalId":299646,"journal":{"name":"VS@HLT-NAACL","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VS@HLT-NAACL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/v1/W15-1512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61

Abstract

We propose a novel approach to learning distributed representations of variable-length text sequences in multiple languages simultaneously. Unlike previous work which often derive representations of multi-word sequences as weighted sums of individual word vectors, our model learns distributed representations for phrases and sentences as a whole. Our work is similar in spirit to the recent paragraph vector approach but extends to the bilingual context so as to efficiently encode meaning-equivalent text sequences of multiple languages in the same semantic space. Our learned embeddings achieve state-of-theart performance in the often used crosslingual document classification task (CLDC) with an accuracy of 92.7 for English to German and 91.5 for German to English. By learning text sequence representations as a whole, our model performs equally well in both classification directions in the CLDC task in which past work did not achieve.
学习多语言文本序列的分布式表示
我们提出了一种同时学习多种语言的变长文本序列的分布式表示的新方法。与以往的工作不同,我们的模型通常将多词序列的表示作为单个词向量的加权和,而将短语和句子作为一个整体来学习分布式表示。我们的工作在精神上与最近的段落向量方法相似,但扩展到双语上下文,以便在同一语义空间中有效地编码多种语言的意义等效文本序列。我们学习的嵌入在常用的跨语言文档分类任务(CLDC)中实现了最先进的性能,英语到德语的准确率为92.7,德语到英语的准确率为91.5。通过整体学习文本序列表示,我们的模型在CLDC任务的两个分类方向上都表现良好,这是过去的工作无法实现的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信