Devise Sparse Compression Schedulers to Enhance FastText Methods

Workshop Proceedings of the 49th International Conference on Parallel Processing Pub Date : 2020-08-17 DOI:10.1145/3409390.3409394

Chen-Ting Chao, Wei-Hsu Chu, Chao-Lin Lee, Jenq-Kuen Lee, Ming-Yu Hung, Hsiang-Wei Sung

{"title":"Devise Sparse Compression Schedulers to Enhance FastText Methods","authors":"Chen-Ting Chao, Wei-Hsu Chu, Chao-Lin Lee, Jenq-Kuen Lee, Ming-Yu Hung, Hsiang-Wei Sung","doi":"10.1145/3409390.3409394","DOIUrl":null,"url":null,"abstract":"In natural language processing(NLP), the general way to understand the meaning of a word is via word embedding. The word embedding training model can convert words into multidimensional vectors and make the words that do not know “meaning” into vectors with “meaning”. Famous word embedding training models, include models such as FastText, Word2Vec, and GloVe. They can train words into vectors and then they are used for further semantic classifications. In this paper, we work on the efficient support for the FastText. FastText is an open source library created by Facebook(FAIR) lab that allows users to learn word embedding and text classification. We focus on the word representation application in FastText, in which general matrix-Vector multiplication(GEMV) is one of the most computationally intensive operations. In this paper, we adjust the software architecture of FastText, and pre-process the pre-trained model offline. In addition, we introduce a new accelerating method with sparse matrix compression in Halide, which improves performance by compressing the matrix. Our support with Halide sparse compression schedulers include hybrid compression schemes and re-ordering methods to improve the performance.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"177 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 49th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409390.3409394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In natural language processing(NLP), the general way to understand the meaning of a word is via word embedding. The word embedding training model can convert words into multidimensional vectors and make the words that do not know “meaning” into vectors with “meaning”. Famous word embedding training models, include models such as FastText, Word2Vec, and GloVe. They can train words into vectors and then they are used for further semantic classifications. In this paper, we work on the efficient support for the FastText. FastText is an open source library created by Facebook(FAIR) lab that allows users to learn word embedding and text classification. We focus on the word representation application in FastText, in which general matrix-Vector multiplication(GEMV) is one of the most computationally intensive operations. In this paper, we adjust the software architecture of FastText, and pre-process the pre-trained model offline. In addition, we introduce a new accelerating method with sparse matrix compression in Halide, which improves performance by compressing the matrix. Our support with Halide sparse compression schedulers include hybrid compression schemes and re-ordering methods to improve the performance.

查看原文本刊更多论文

设计稀疏压缩调度器来增强FastText方法

在自然语言处理(NLP)中，理解一个词的意思的一般方法是通过词嵌入。单词嵌入训练模型可以将单词转化为多维向量，将不知道“意思”的单词转化为有“意思”的向量。著名的词嵌入训练模型包括FastText、Word2Vec、GloVe等模型。它们可以将单词训练成向量，然后用于进一步的语义分类。在本文中，我们致力于快速文本的有效支持。FastText是由Facebook(FAIR)实验室创建的一个开源库，允许用户学习单词嵌入和文本分类。本文重点研究了FastText中的单词表示应用，其中一般矩阵向量乘法(GEMV)是计算量最大的运算之一。本文对FastText的软件架构进行调整，并对预训练好的模型进行离线预处理。此外，我们在Halide中引入了一种新的稀疏矩阵压缩加速方法，通过压缩矩阵来提高性能。我们对Halide稀疏压缩调度器的支持包括混合压缩方案和重新排序方法，以提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop Proceedings of the 49th International Conference on Parallel Processing

自引率

0.00%

发文量