{"title":"A clickstream-based web page significance ranking metric for Web crawlers","authors":"Fatemeh Ahmadi-Abkenari, A. Selamat","doi":"10.1109/MYSEC.2011.6140674","DOIUrl":null,"url":null,"abstract":"The unpredictable fast growing dimension of the World Wide Web and its non-static nature causes considerable obstacles for Web crawlers including the presence of some incorrect and irrelevant answers among search results set and the scaling issues. Hence, solutions that are more promising are in demand to provide more accurate search outcomes. Because implementing existed Web page importance metrics either link based or context based within a parallel crawler can not be an absolute solution for the coverage of authorized fresh Web content and the accuracy concerns, so employing these metrics is not the final approach within search engines' architecture. This paper proposes an analysis on clickstream data in order to discover the popularity of Web pages in crawl frontier through proposing the metric itself and presenting the experimental results on ranking the UTM Web pages based on the proposed discussed metric.","PeriodicalId":137714,"journal":{"name":"2011 Malaysian Conference in Software Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Malaysian Conference in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MYSEC.2011.6140674","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The unpredictable fast growing dimension of the World Wide Web and its non-static nature causes considerable obstacles for Web crawlers including the presence of some incorrect and irrelevant answers among search results set and the scaling issues. Hence, solutions that are more promising are in demand to provide more accurate search outcomes. Because implementing existed Web page importance metrics either link based or context based within a parallel crawler can not be an absolute solution for the coverage of authorized fresh Web content and the accuracy concerns, so employing these metrics is not the final approach within search engines' architecture. This paper proposes an analysis on clickstream data in order to discover the popularity of Web pages in crawl frontier through proposing the metric itself and presenting the experimental results on ranking the UTM Web pages based on the proposed discussed metric.