Indexing the web - a challenge for supercomputers

M. Henzinger
{"title":"Indexing the web - a challenge for supercomputers","authors":"M. Henzinger","doi":"10.1109/CLUSTR.2002.1137763","DOIUrl":null,"url":null,"abstract":"Since January 2002, the Google search engine has been powering an average of 150 million web searches a day, with a peark of over 2000 searches per second. These searches are performed over an index of over 2 billion documents, over 300 million images, and over 700 million Usenet messages. To guarantee fast user response time, Google performs these searches on a cluster of over 10,000 PCs. The main challenages with this architecture are fault-tolerance and the quality of search results. Replication solves the former and the PageRank score is used to advance the latter. The PageRank score is based on an eigenvalue computation of a large matrix that is derived from the web graph and is one of the main contributor to very high quality search results. As Internet use continues to grow, so does the use of the Google search engine. The Google architecture is designed to scale to accommodate the growth in useage as well as the growth of the web.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2002.1137763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Since January 2002, the Google search engine has been powering an average of 150 million web searches a day, with a peark of over 2000 searches per second. These searches are performed over an index of over 2 billion documents, over 300 million images, and over 700 million Usenet messages. To guarantee fast user response time, Google performs these searches on a cluster of over 10,000 PCs. The main challenages with this architecture are fault-tolerance and the quality of search results. Replication solves the former and the PageRank score is used to advance the latter. The PageRank score is based on an eigenvalue computation of a large matrix that is derived from the web graph and is one of the main contributor to very high quality search results. As Internet use continues to grow, so does the use of the Google search engine. The Google architecture is designed to scale to accommodate the growth in useage as well as the growth of the web.
索引网络——对超级计算机的挑战
自2002年1月以来,谷歌搜索引擎每天平均进行1.5亿次网络搜索,每秒搜索次数超过2000次。这些搜索是在超过20亿个文档、超过3亿个图像和超过7亿个Usenet消息的索引上执行的。为了保证快速的用户响应时间,Google在超过10,000台pc的集群上执行这些搜索。这种架构的主要挑战是容错性和搜索结果的质量。复制解决了前者,而PageRank分数用于推进后者。PageRank分数是基于一个大矩阵的特征值计算,该矩阵来源于网络图,是非常高质量搜索结果的主要贡献者之一。随着互联网的使用不断增长,谷歌搜索引擎的使用也在不断增长。Google架构的设计是为了适应用户的增长和网络的增长。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信