通过对齐并行学习单词嵌入

Sahil Zubair, M. Zubair
{"title":"通过对齐并行学习单词嵌入","authors":"Sahil Zubair, M. Zubair","doi":"10.1109/HPCS.2017.90","DOIUrl":null,"url":null,"abstract":"Distributed representations have become the de facto standard by which many modern neural network architectures deal with natural language processing tasks. In particular, the word2vec algorithm introduced by Mikolov, et al. popularized the use of distributed representations by demonstrating that learned embeddings capture semantic relationships geometrically. Though word2vec addresses some of the scaling issues of earlier approaches, it can still take days to complete the training process for very large data sets. Recently, researchers have tried to address this by proposing parallel variants of the word2vec algorithm. Note that in these approaches, the data set is partitioned among multiple processors that asynchronously update a shared model. We propose a parallel approach for word2vec that is based on instantiating multiple models and working with their own data sets. Our scheme transfers the learning between different models at discrete intervals (synchronously). The frequency with which we transfer the learning between different models is much less compared to the frequency of asynchronous updates in existing approaches. In our approach, we treat each of our instantiated word2vec instances as independent models. This implies that off the shelf implementations of word2vec can be used in our parallel approach. The key feature of our algorithm is in how we transfer the parameters between different models that have been independently trained using distinct partitions of a large data set. For this we propose a computationally inexpensive alignment and merge step. We validate our algorithm on a publicly available dataset using an implementation of word2vec in Google's tensorflow software. We evaluate our algorithm by comparing its runtime with the runtime of the sequential algorithm for a given training loss. Our results show that our parallel algorithm is able to achieve efficiency up to 57%.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"os-30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Word Embeddings in Parallel by Alignment\",\"authors\":\"Sahil Zubair, M. Zubair\",\"doi\":\"10.1109/HPCS.2017.90\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed representations have become the de facto standard by which many modern neural network architectures deal with natural language processing tasks. In particular, the word2vec algorithm introduced by Mikolov, et al. popularized the use of distributed representations by demonstrating that learned embeddings capture semantic relationships geometrically. Though word2vec addresses some of the scaling issues of earlier approaches, it can still take days to complete the training process for very large data sets. Recently, researchers have tried to address this by proposing parallel variants of the word2vec algorithm. Note that in these approaches, the data set is partitioned among multiple processors that asynchronously update a shared model. We propose a parallel approach for word2vec that is based on instantiating multiple models and working with their own data sets. Our scheme transfers the learning between different models at discrete intervals (synchronously). The frequency with which we transfer the learning between different models is much less compared to the frequency of asynchronous updates in existing approaches. In our approach, we treat each of our instantiated word2vec instances as independent models. This implies that off the shelf implementations of word2vec can be used in our parallel approach. The key feature of our algorithm is in how we transfer the parameters between different models that have been independently trained using distinct partitions of a large data set. For this we propose a computationally inexpensive alignment and merge step. We validate our algorithm on a publicly available dataset using an implementation of word2vec in Google's tensorflow software. We evaluate our algorithm by comparing its runtime with the runtime of the sequential algorithm for a given training loss. Our results show that our parallel algorithm is able to achieve efficiency up to 57%.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"os-30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.90\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

分布式表示已经成为许多现代神经网络架构处理自然语言处理任务的事实上的标准。特别是,Mikolov等人引入的word2vec算法通过展示学习的嵌入以几何方式捕获语义关系,推广了分布式表示的使用。虽然word2vec解决了早期方法的一些扩展问题,但对于非常大的数据集,它仍然需要几天的时间来完成训练过程。最近,研究人员试图通过提出word2vec算法的并行变体来解决这个问题。请注意,在这些方法中,数据集在多个处理器之间进行分区,这些处理器异步更新共享模型。我们为word2vec提出了一种并行方法,该方法基于实例化多个模型并使用它们自己的数据集。我们的方案以离散间隔(同步)在不同模型之间传递学习。与现有方法中异步更新的频率相比,我们在不同模型之间转移学习的频率要低得多。在我们的方法中,我们将每个实例化的word2vec实例视为独立的模型。这意味着可以在我们的并行方法中使用现成的word2vec实现。我们算法的关键特征在于我们如何在不同模型之间传递参数,这些模型使用大型数据集的不同分区进行独立训练。为此,我们提出了一种计算成本低廉的对齐和合并步骤。我们使用Google的tensorflow软件中的word2vec实现在一个公开可用的数据集上验证了我们的算法。我们通过比较给定训练损失下序列算法的运行时间来评估我们的算法。结果表明,该并行算法的效率可达57%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning Word Embeddings in Parallel by Alignment
Distributed representations have become the de facto standard by which many modern neural network architectures deal with natural language processing tasks. In particular, the word2vec algorithm introduced by Mikolov, et al. popularized the use of distributed representations by demonstrating that learned embeddings capture semantic relationships geometrically. Though word2vec addresses some of the scaling issues of earlier approaches, it can still take days to complete the training process for very large data sets. Recently, researchers have tried to address this by proposing parallel variants of the word2vec algorithm. Note that in these approaches, the data set is partitioned among multiple processors that asynchronously update a shared model. We propose a parallel approach for word2vec that is based on instantiating multiple models and working with their own data sets. Our scheme transfers the learning between different models at discrete intervals (synchronously). The frequency with which we transfer the learning between different models is much less compared to the frequency of asynchronous updates in existing approaches. In our approach, we treat each of our instantiated word2vec instances as independent models. This implies that off the shelf implementations of word2vec can be used in our parallel approach. The key feature of our algorithm is in how we transfer the parameters between different models that have been independently trained using distinct partitions of a large data set. For this we propose a computationally inexpensive alignment and merge step. We validate our algorithm on a publicly available dataset using an implementation of word2vec in Google's tensorflow software. We evaluate our algorithm by comparing its runtime with the runtime of the sequential algorithm for a given training loss. Our results show that our parallel algorithm is able to achieve efficiency up to 57%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信