使用移动爬虫高效地搜索Web

J. Hammer, Jan Fiedler
{"title":"使用移动爬虫高效地搜索Web","authors":"J. Hammer, Jan Fiedler","doi":"10.5555/543101.543105","DOIUrl":null,"url":null,"abstract":"Due to the enormous growth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing. In this paper, we demonstrate an alternative, more efficient approach to the “download-first process-later” strategy of existing search engines by using mobile crawlers. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce network load which, in turn, can improve the performance of the crawling process. In this report, we provide a detailed description of our architecture supporting mobile Web crawling and report on its novel features as well as the rational behind some of the important design decisions that were driving our development. In order to demonstrate the viability of our approach and to validate our mobile crawling architecture, we have implemented a prototype that uses the UF intranet as its testbed. Based on this experimental prototype, we conducted a detailed evaluation of the benefits of mobile Web crawling.","PeriodicalId":177607,"journal":{"name":"ACIS Int. J. Comput. Inf. Sci.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":"{\"title\":\"Using mobile crawlers to search the Web efficiently\",\"authors\":\"J. Hammer, Jan Fiedler\",\"doi\":\"10.5555/543101.543105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the enormous growth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing. In this paper, we demonstrate an alternative, more efficient approach to the “download-first process-later” strategy of existing search engines by using mobile crawlers. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce network load which, in turn, can improve the performance of the crawling process. In this report, we provide a detailed description of our architecture supporting mobile Web crawling and report on its novel features as well as the rational behind some of the important design decisions that were driving our development. In order to demonstrate the viability of our approach and to validate our mobile crawling architecture, we have implemented a prototype that uses the UF intranet as its testbed. Based on this experimental prototype, we conducted a detailed evaluation of the benefits of mobile Web crawling.\",\"PeriodicalId\":177607,\"journal\":{\"name\":\"ACIS Int. J. Comput. Inf. Sci.\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"52\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACIS Int. J. Comput. Inf. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/543101.543105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACIS Int. J. Comput. Inf. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/543101.543105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 52

摘要

由于万维网的巨大增长,搜索引擎已经成为网络导航不可或缺的工具。为了提供强大的搜索功能,搜索引擎通过不断下载Web页面进行处理来维护Web上文档及其内容的综合索引。在本文中,我们展示了另一种更有效的方法,通过使用移动爬虫来实现现有搜索引擎的“先下载后处理”策略。移动方法的主要优点是,爬行过程的分析部分是在数据所在的本地完成的,而不是在Web搜索引擎中远程完成的。这可以显著减少网络负载,从而提高爬行过程的性能。在本报告中,我们详细描述了支持移动Web爬行的体系结构,并报告了它的新特性,以及推动我们开发的一些重要设计决策背后的原因。为了证明我们的方法的可行性,并验证我们的移动爬行架构,我们已经实现了一个原型,使用UF内部网作为其测试平台。基于这个实验原型,我们对移动网络抓取的好处进行了详细的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using mobile crawlers to search the Web efficiently
Due to the enormous growth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing. In this paper, we demonstrate an alternative, more efficient approach to the “download-first process-later” strategy of existing search engines by using mobile crawlers. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce network load which, in turn, can improve the performance of the crawling process. In this report, we provide a detailed description of our architecture supporting mobile Web crawling and report on its novel features as well as the rational behind some of the important design decisions that were driving our development. In order to demonstrate the viability of our approach and to validate our mobile crawling architecture, we have implemented a prototype that uses the UF intranet as its testbed. Based on this experimental prototype, we conducted a detailed evaluation of the benefits of mobile Web crawling.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信