{"title":"LiveRank: How to Refresh Old Datasets","authors":"The Dang Huynh, F. Mathieu, L. Viennot","doi":"10.1080/15427951.2015.1098756","DOIUrl":null,"url":null,"abstract":"ABSTRACT This article considers the problem of refreshing a dataset. More precisely, given a collection of nodes gathered at some time (webpages, users from an online social network) along with some structure (hyperlinks, social relationships), we want to identify a significant fraction of the nodes that still exist at present time. The liveness of an old node can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the active nodes when using the LiveRank order. We study different scenarios from a static setting where the LiveRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks, for web graphs as well as for online social networks.","PeriodicalId":38105,"journal":{"name":"Internet Mathematics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15427951.2015.1098756","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15427951.2015.1098756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACT This article considers the problem of refreshing a dataset. More precisely, given a collection of nodes gathered at some time (webpages, users from an online social network) along with some structure (hyperlinks, social relationships), we want to identify a significant fraction of the nodes that still exist at present time. The liveness of an old node can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the active nodes when using the LiveRank order. We study different scenarios from a static setting where the LiveRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks, for web graphs as well as for online social networks.