Parallelization and Characterization of Probabilistic Latent Semantic Analysis

Chuntao Hong, Wenguang Chen, Weimin Zheng, Jiulong Shan, Yurong Chen, Yimin Zhang
{"title":"Parallelization and Characterization of Probabilistic Latent Semantic Analysis","authors":"Chuntao Hong, Wenguang Chen, Weimin Zheng, Jiulong Shan, Yurong Chen, Yimin Zhang","doi":"10.1109/ICPP.2008.8","DOIUrl":null,"url":null,"abstract":"Probabilistic Latent Semantic Analysis (PLSA) is one of the most popular statistical techniques for the analysis of two-model and co-occurrence data. It has applications in information retrieval and filtering, nature language processing, machine learning from text, and other related areas. However, PLSA is rarely applied to large datasets due to its high computational complexity.This paper presents an optimized and parallelized implementation of PLSA which is capable of processing datasets with 10000 documents in seconds. Compared to the baseline program, our parallelized program can achieve speedup of more than six on an eight-processor machine. The characterization of the parallel program is also presented. The performance analysis of the parallel program indicates that this program is memory intensive and the limited memory bandwidth is the bottleneck for better speedup.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 37th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2008.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Probabilistic Latent Semantic Analysis (PLSA) is one of the most popular statistical techniques for the analysis of two-model and co-occurrence data. It has applications in information retrieval and filtering, nature language processing, machine learning from text, and other related areas. However, PLSA is rarely applied to large datasets due to its high computational complexity.This paper presents an optimized and parallelized implementation of PLSA which is capable of processing datasets with 10000 documents in seconds. Compared to the baseline program, our parallelized program can achieve speedup of more than six on an eight-processor machine. The characterization of the parallel program is also presented. The performance analysis of the parallel program indicates that this program is memory intensive and the limited memory bandwidth is the bottleneck for better speedup.
概率潜在语义分析的并行化与表征
概率潜在语义分析(PLSA)是目前最流行的用于分析双模型和共现数据的统计技术之一。它在信息检索和过滤、自然语言处理、文本机器学习和其他相关领域都有应用。然而,由于PLSA的高计算复杂度,它很少应用于大型数据集。本文提出了一种优化的并行化PLSA实现,它能够在几秒内处理10000个文档的数据集。与基准程序相比,我们的并行程序可以在8个处理器的机器上实现6个以上的加速。文中还介绍了并行程序的特性。对并行程序的性能分析表明,该程序占用大量内存,有限的内存带宽是实现更好加速的瓶颈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信