Performance optimization of topic modeling algorithms using a graphic processing unit

2016 IEEE Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2016-04-29 DOI:10.1109/SIEDS.2016.7489321

Prateek Agrawal, Savi Kuriakose, William Middleton, Deyuan Guo, Dong Hyuk Kim, Ke Wang

{"title":"Performance optimization of topic modeling algorithms using a graphic processing unit","authors":"Prateek Agrawal, Savi Kuriakose, William Middleton, Deyuan Guo, Dong Hyuk Kim, Ke Wang","doi":"10.1109/SIEDS.2016.7489321","DOIUrl":null,"url":null,"abstract":"Text mining can be effectively deployed to improve our understanding of the real world by extracting relevant features such as cultural context from text data. Techniques such as topic models are shown to be useful in automatically extracting the topical or semantic content from unstructured data. Such a system should consume a large amount of text and extract meaningful patterns usually within a specified amount of time. Graphics Processing Unit is increasingly being used for computationally intensive tasks because of the inexpensive, high-performance raw processing power it has to offer. In this paper, we implement, test, and compare various topic modeling algorithms in a Graphics Processing Unit to achieve faster computing time compared to traditional implementations in a Central Processing Unit. The goal is to execute parallel Graphics Processing Unit versions of algorithms, such as Latent Dirichlet Allocation and Latent Semantic Analysis, and quantitatively assess the performance of each algorithm in comparison with serial or multi-core versions of the same topic modeling algorithms. The study aims to provide a comprehensive understanding of the effectiveness of a spectrum of Topic Model algorithms, the merits of such models in the Graphics Processing Unit, and the magnitude of efficiency improvement that can be achieved. Experimental results show that Topic Modeling algorithms can achieve 10x to 40x speedup in the GPU framework.","PeriodicalId":426864,"journal":{"name":"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2016.7489321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text mining can be effectively deployed to improve our understanding of the real world by extracting relevant features such as cultural context from text data. Techniques such as topic models are shown to be useful in automatically extracting the topical or semantic content from unstructured data. Such a system should consume a large amount of text and extract meaningful patterns usually within a specified amount of time. Graphics Processing Unit is increasingly being used for computationally intensive tasks because of the inexpensive, high-performance raw processing power it has to offer. In this paper, we implement, test, and compare various topic modeling algorithms in a Graphics Processing Unit to achieve faster computing time compared to traditional implementations in a Central Processing Unit. The goal is to execute parallel Graphics Processing Unit versions of algorithms, such as Latent Dirichlet Allocation and Latent Semantic Analysis, and quantitatively assess the performance of each algorithm in comparison with serial or multi-core versions of the same topic modeling algorithms. The study aims to provide a comprehensive understanding of the effectiveness of a spectrum of Topic Model algorithms, the merits of such models in the Graphics Processing Unit, and the magnitude of efficiency improvement that can be achieved. Experimental results show that Topic Modeling algorithms can achieve 10x to 40x speedup in the GPU framework.

查看原文本刊更多论文

使用图形处理单元的主题建模算法的性能优化

通过从文本数据中提取文化背景等相关特征，可以有效地利用文本挖掘来提高我们对现实世界的理解。主题模型等技术在从非结构化数据中自动提取主题或语义内容方面非常有用。这样的系统应该消耗大量的文本，并通常在指定的时间内提取有意义的模式。图形处理单元越来越多地被用于计算密集型任务，因为它提供了廉价、高性能的原始处理能力。在本文中，我们在图形处理单元中实现、测试和比较各种主题建模算法，以实现比在中央处理单元中实现更快的计算时间。目标是执行并行图形处理单元版本的算法，如潜在狄利克雷分配和潜在语义分析，并与同一主题建模算法的串行或多核版本进行比较，定量评估每种算法的性能。本研究旨在全面了解一系列主题模型算法的有效性，这些模型在图形处理单元中的优点，以及可以实现的效率改进的幅度。实验结果表明，主题建模算法在GPU框架下可以实现10 ~ 40倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量