Using mobile crawlers to search the Web efficiently

ACIS Int. J. Comput. Inf. Sci. Pub Date : 2000-11-01 DOI:10.5555/543101.543105

J. Hammer, Jan Fiedler

{"title":"Using mobile crawlers to search the Web efficiently","authors":"J. Hammer, Jan Fiedler","doi":"10.5555/543101.543105","DOIUrl":null,"url":null,"abstract":"Due to the enormous growth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing. In this paper, we demonstrate an alternative, more efficient approach to the “download-first process-later” strategy of existing search engines by using mobile crawlers. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce network load which, in turn, can improve the performance of the crawling process. In this report, we provide a detailed description of our architecture supporting mobile Web crawling and report on its novel features as well as the rational behind some of the important design decisions that were driving our development. In order to demonstrate the viability of our approach and to validate our mobile crawling architecture, we have implemented a prototype that uses the UF intranet as its testbed. Based on this experimental prototype, we conducted a detailed evaluation of the benefits of mobile Web crawling.","PeriodicalId":177607,"journal":{"name":"ACIS Int. J. Comput. Inf. Sci.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACIS Int. J. Comput. Inf. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/543101.543105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 52

Abstract

Due to the enormous growth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing. In this paper, we demonstrate an alternative, more efficient approach to the “download-first process-later” strategy of existing search engines by using mobile crawlers. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce network load which, in turn, can improve the performance of the crawling process. In this report, we provide a detailed description of our architecture supporting mobile Web crawling and report on its novel features as well as the rational behind some of the important design decisions that were driving our development. In order to demonstrate the viability of our approach and to validate our mobile crawling architecture, we have implemented a prototype that uses the UF intranet as its testbed. Based on this experimental prototype, we conducted a detailed evaluation of the benefits of mobile Web crawling.

查看原文本刊更多论文

使用移动爬虫高效地搜索Web

由于万维网的巨大增长，搜索引擎已经成为网络导航不可或缺的工具。为了提供强大的搜索功能，搜索引擎通过不断下载Web页面进行处理来维护Web上文档及其内容的综合索引。在本文中，我们展示了另一种更有效的方法，通过使用移动爬虫来实现现有搜索引擎的“先下载后处理”策略。移动方法的主要优点是，爬行过程的分析部分是在数据所在的本地完成的，而不是在Web搜索引擎中远程完成的。这可以显著减少网络负载，从而提高爬行过程的性能。在本报告中，我们详细描述了支持移动Web爬行的体系结构，并报告了它的新特性，以及推动我们开发的一些重要设计决策背后的原因。为了证明我们的方法的可行性，并验证我们的移动爬行架构，我们已经实现了一个原型，使用UF内部网作为其测试平台。基于这个实验原型，我们对移动网络抓取的好处进行了详细的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACIS Int. J. Comput. Inf. Sci.

自引率

0.00%

发文量