{"title":"Comparison of Scheduling Algorithms for Domain Specific Web Crawler","authors":"Krzysztof Filipowski","doi":"10.1109/ENIC.2014.14","DOIUrl":null,"url":null,"abstract":"Domain-specific Web crawlers are effective tools for acquiring information from the Web. One of the most crucial factors influencing the efficiency of domain crawlers is choice of crawling strategy. This article describes and compares several strategies for domain specific Web crawling. It concentrates particularly on scheduling algorithms which determine order of crawling URLs collected by the crawler. The objective of these strategies is to download the most relevant Web pages in an early stage of the crawl. In the paper there are presented four different algorithms which are compared using several metrics.","PeriodicalId":185148,"journal":{"name":"2014 European Network Intelligence Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 European Network Intelligence Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ENIC.2014.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Domain-specific Web crawlers are effective tools for acquiring information from the Web. One of the most crucial factors influencing the efficiency of domain crawlers is choice of crawling strategy. This article describes and compares several strategies for domain specific Web crawling. It concentrates particularly on scheduling algorithms which determine order of crawling URLs collected by the crawler. The objective of these strategies is to download the most relevant Web pages in an early stage of the crawl. In the paper there are presented four different algorithms which are compared using several metrics.