Web爬行

IF 8.3 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Christopher Olston, Marc Najork
{"title":"Web爬行","authors":"Christopher Olston, Marc Najork","doi":"10.1561/1500000017","DOIUrl":null,"url":null,"abstract":"This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"20 1","pages":"175-246"},"PeriodicalIF":8.3000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Web Crawling\",\"authors\":\"Christopher Olston, Marc Najork\",\"doi\":\"10.1561/1500000017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.\",\"PeriodicalId\":48829,\"journal\":{\"name\":\"Foundations and Trends in Information Retrieval\",\"volume\":\"20 1\",\"pages\":\"175-246\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2010-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Foundations and Trends in Information Retrieval\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1561/1500000017\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1561/1500000017","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

摘要

这是对网络爬行的科学和实践的调查。虽然乍一看,网络爬行似乎只是广度优先搜索的一个应用,但事实是存在许多挑战,从系统问题(如管理非常大的数据结构)到理论问题(如多久重新访问一次不断发展的内容源)。本调查概述了基本挑战,并描述了最先进的模型和解决方案。它还强调了未来工作的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Web Crawling
This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Foundations and Trends in Information Retrieval
Foundations and Trends in Information Retrieval COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
39.10
自引率
0.00%
发文量
3
期刊介绍: The surge in research across all domains in the past decade has resulted in a plethora of new publications, causing an exponential growth in published research. Navigating through this extensive literature and staying current has become a time-consuming challenge. While electronic publishing provides instant access to more articles than ever, discerning the essential ones for a comprehensive understanding of any topic remains an issue. To tackle this, Foundations and Trends® in Information Retrieval - FnTIR - addresses the problem by publishing high-quality survey and tutorial monographs in the field. Each issue of Foundations and Trends® in Information Retrieval - FnT IR features a 50-100 page monograph authored by research leaders, covering tutorial subjects, research retrospectives, and survey papers that provide state-of-the-art reviews within the scope of the journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信