Parallel PageRank computation using GPUs

N. Duong, Q. Nguyen, Anh Nguyen, Huu-Duc Nguyen
{"title":"Parallel PageRank computation using GPUs","authors":"N. Duong, Q. Nguyen, Anh Nguyen, Huu-Duc Nguyen","doi":"10.1145/2350716.2350751","DOIUrl":null,"url":null,"abstract":"Fast & efficient computing of web rank scores is a necessary issue of search engines today. Because of the enormous size of data and the dynamic nature of World Wide Web, this computation is generally executed on large web graphs (to billions webpages) and requires refreshing quite often, so it becomes a challenging task. In this paper, we propose an efficient method for computing PageRank score -- a Google ranking method based on analyzing the link structure of the Web on graphics processing units (GPUs). We have employed a slightly modification of a storage data format called binary 'link structure file' which inspirited from [2] for storing the web graph data. We then divided the PageRank calculating phases into parallel operations for exploiting the computing power of the graphics cards. Our program was written in CUDA language to experiment on a system equipped two double NVIDIA GeForce GTX 295 graphics cards, using two real datasets which were crawled from Vietnamese sites containing 7 million pages, 132 million links and 15 million pages, 200 million links, respectively. The experimental results showed that the computation speed increase from 10 to 20 times when compared to a CPU Intel Q8400 at 2.67 GHz based version, on both datasets. Our method can also scale up well for larger web graphs.","PeriodicalId":208300,"journal":{"name":"Proceedings of the 3rd Symposium on Information and Communication Technology","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2350716.2350751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

Fast & efficient computing of web rank scores is a necessary issue of search engines today. Because of the enormous size of data and the dynamic nature of World Wide Web, this computation is generally executed on large web graphs (to billions webpages) and requires refreshing quite often, so it becomes a challenging task. In this paper, we propose an efficient method for computing PageRank score -- a Google ranking method based on analyzing the link structure of the Web on graphics processing units (GPUs). We have employed a slightly modification of a storage data format called binary 'link structure file' which inspirited from [2] for storing the web graph data. We then divided the PageRank calculating phases into parallel operations for exploiting the computing power of the graphics cards. Our program was written in CUDA language to experiment on a system equipped two double NVIDIA GeForce GTX 295 graphics cards, using two real datasets which were crawled from Vietnamese sites containing 7 million pages, 132 million links and 15 million pages, 200 million links, respectively. The experimental results showed that the computation speed increase from 10 to 20 times when compared to a CPU Intel Q8400 at 2.67 GHz based version, on both datasets. Our method can also scale up well for larger web graphs.
使用gpu并行计算PageRank
快速高效地计算网站排名分数是搜索引擎今天的一个必要问题。由于数据的巨大规模和万维网的动态特性,这种计算通常在大型网络图形(数十亿个网页)上执行,并且需要经常刷新,因此它成为一项具有挑战性的任务。在本文中,我们提出了一种计算PageRank分数的有效方法——基于分析图形处理单元(gpu)上Web的链接结构的谷歌排名方法。我们略微修改了存储数据的格式,称为二进制“链接结构文件”,它的灵感来自[2],用于存储web图形数据。然后,我们将PageRank计算阶段划分为并行操作,以利用显卡的计算能力。我们的程序是用CUDA语言编写的,在一个配备了两个双NVIDIA GeForce GTX 295显卡的系统上进行实验,使用了两个真实的数据集,分别是从越南网站上抓取的,分别包含700万页,1.32亿链接和1500万页,2亿链接。实验结果表明,在两个数据集上,与基于2.67 GHz的CPU Intel Q8400相比,计算速度提高了10到20倍。我们的方法也可以很好地扩展到更大的web图形。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信