AggregateRank: bringing order to web sites

Guang Feng, Tie-Yan Liu, Y. Wang, Ying Bao, Zhiming Ma, Xu-Dong Zhang, Wei-Ying Ma
{"title":"AggregateRank: bringing order to web sites","authors":"Guang Feng, Tie-Yan Liu, Y. Wang, Ying Bao, Zhiming Ma, Xu-Dong Zhang, Wei-Ying Ma","doi":"10.1145/1148170.1148187","DOIUrl":null,"url":null,"abstract":"Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website.In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1148170.1148187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web search and crawling. In order to get the ranks of websites, researchers used to describe the inter-connectivity among websites with a so-called HostGraph in which the nodes denote websites and the edges denote linkages between websites (if and only if there are hyperlinks from the pages in one website to the pages in the other, there will be an edge between these two websites), and then adopted the random walk model in the HostGraph. However, as pointed in this paper, the random walk over such a HostGraph is not reasonable because it is not in accordance with the browsing behavior of web surfers. Therefore, the derivate rank cannot represent the true probability of visiting the corresponding website.In this work, we mathematically proved that the probability of visiting a website by the random web surfer should be equal to the sum of the PageRank values of the pages inside that website. Nevertheless, since the number of web pages is much larger than that of websites, it is not feasible to base the calculation of the ranks of websites on the calculation of PageRank. To tackle this problem, we proposed a novel method named AggregateRank rooted in the theory of stochastic complement, which cannot only approximate the sum of PageRank accurately, but also have a lower computational complexity than PageRank. Both theoretical analysis and experimental evaluation show that AggregateRank is a better method for ranking websites than previous methods.
AggregateRank:给网站带来秩序
由于网站是Web最重要的组织结构之一,因此如何有效地对网站进行排名对于许多Web应用程序(如Web搜索和爬行)至关重要。为了得到网站的等级,研究者用一个所谓的HostGraph来描述网站之间的连通性,其中节点表示网站,边表示网站之间的链接(当且仅当一个网站的页面与另一个网站的页面之间存在超链接时,这两个网站之间会有一条边),然后在HostGraph中采用随机漫步模型。然而,正如本文所指出的,在这样一个HostGraph上随机游走是不合理的,因为它不符合网络冲浪者的浏览行为。因此,衍生排名不能代表访问相应网站的真实概率。在这项工作中,我们从数学上证明了随机网络冲浪者访问网站的概率应该等于该网站内页面PageRank值的总和。然而,由于网页的数量远远大于网站的数量,因此将网站排名的计算建立在PageRank的计算基础上是不可行的。为了解决这一问题,我们提出了一种基于随机互补理论的新方法AggregateRank,它不仅能准确地近似PageRank的总和,而且比PageRank具有更低的计算复杂度。理论分析和实验评价表明,AggregateRank是一种较好的网站排名方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信