{"title":"A mathematical model for crawler revisit frequency","authors":"A. Dixit, A. Sharma","doi":"10.1109/IADCC.2010.5422936","DOIUrl":null,"url":null,"abstract":"WWW's expansion coupled with high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. The traditional crawling methods are no longer catch up with this updating and growing web. Alternative distributed crawling scheme that uses migrating crawlers try to maximize the network utilization by minimizing the network load but are hampered due to the deficiency in their web page refresh techniques. The absence of effective measures to verify whether a web page has been changed or not is another challenge. In this paper, an efficient approach for computing revisit frequency is being proposed. Web pages which frequently undergo up-dation are detected and accordingly revisit frequency for the pages is dynamically computed.","PeriodicalId":249763,"journal":{"name":"2010 IEEE 2nd International Advance Computing Conference (IACC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 2nd International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2010.5422936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
WWW's expansion coupled with high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. The traditional crawling methods are no longer catch up with this updating and growing web. Alternative distributed crawling scheme that uses migrating crawlers try to maximize the network utilization by minimizing the network load but are hampered due to the deficiency in their web page refresh techniques. The absence of effective measures to verify whether a web page has been changed or not is another challenge. In this paper, an efficient approach for computing revisit frequency is being proposed. Web pages which frequently undergo up-dation are detected and accordingly revisit frequency for the pages is dynamically computed.