O. Vikas, Nitin Chiluka, Purushottam K. Ray, Girraj Meena, A. Meshram, Amit Gupta, Abhishek Sisodia
{"title":"WebMiner--Anatomy of Super Peer Based Incremental Topic-Specific Web Crawler","authors":"O. Vikas, Nitin Chiluka, Purushottam K. Ray, Girraj Meena, A. Meshram, Amit Gupta, Abhishek Sisodia","doi":"10.1109/ICN.2007.104","DOIUrl":null,"url":null,"abstract":"This paper introduces \"WebMiner\", a super-peer based P2P system for building an incremental topic-specific Web crawler. This develops a topic-based repository of Web pages that would later be used in the construction of ontologies. Current crawlers suffer from centralized architecture, having single point of failure and heavy load. Super-peer systems strike a balance between the inherent efficiency of centralized search and the autonomity, load balancing and robustness to attacks, provided by distributed search, with heterogeneity of capabilities across peers. In this paper, we discuss the architecture of WebMiner in detail including the construction of the super-peer overlay network and the working of the system, which includes feature of crawling the hidden Web.","PeriodicalId":117154,"journal":{"name":"Sixth International Conference on Networking (ICN'07)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Networking (ICN'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICN.2007.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper introduces "WebMiner", a super-peer based P2P system for building an incremental topic-specific Web crawler. This develops a topic-based repository of Web pages that would later be used in the construction of ontologies. Current crawlers suffer from centralized architecture, having single point of failure and heavy load. Super-peer systems strike a balance between the inherent efficiency of centralized search and the autonomity, load balancing and robustness to attacks, provided by distributed search, with heterogeneity of capabilities across peers. In this paper, we discuss the architecture of WebMiner in detail including the construction of the super-peer overlay network and the working of the system, which includes feature of crawling the hidden Web.