一个有效的方案，以消除爬虫流量从互联网

Proceedings. Eleventh International Conference on Computer Communications and Networks Pub Date : 2002-12-10 DOI:10.1109/ICCCN.2002.1043051

Xiaoqin Yuan, M. MacGregor, J. Harms

{"title":"一个有效的方案，以消除爬虫流量从互联网","authors":"Xiaoqin Yuan, M. MacGregor, J. Harms","doi":"10.1109/ICCCN.2002.1043051","DOIUrl":null,"url":null,"abstract":"We estimate that approximately 40% of current Internet traffic is due to Web crawlers retrieving pages for indexing. We address this problem by introducing an efficient indexing system based on active networks. Our approach employs strategically placed active routers that constantly monitor passing Internet traffic, analyze it, and then transmit the index data to a dedicated back-end repository. Our simulations have shown that active indexing is up to 30% more efficient than the current crawler-based techniques.","PeriodicalId":302787,"journal":{"name":"Proceedings. Eleventh International Conference on Computer Communications and Networks","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"An efficient scheme to remove crawler traffic from the Internet\",\"authors\":\"Xiaoqin Yuan, M. MacGregor, J. Harms\",\"doi\":\"10.1109/ICCCN.2002.1043051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We estimate that approximately 40% of current Internet traffic is due to Web crawlers retrieving pages for indexing. We address this problem by introducing an efficient indexing system based on active networks. Our approach employs strategically placed active routers that constantly monitor passing Internet traffic, analyze it, and then transmit the index data to a dedicated back-end repository. Our simulations have shown that active indexing is up to 30% more efficient than the current crawler-based techniques.\",\"PeriodicalId\":302787,\"journal\":{\"name\":\"Proceedings. Eleventh International Conference on Computer Communications and Networks\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Eleventh International Conference on Computer Communications and Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCN.2002.1043051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Eleventh International Conference on Computer Communications and Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2002.1043051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

我们估计当前大约40%的互联网流量是由于Web爬虫检索页面进行索引。我们通过引入一个基于主动网络的高效索引系统来解决这个问题。我们的方法采用战略性地放置活动路由器，这些路由器不断地监视通过的Internet流量，对其进行分析，然后将索引数据传输到专用的后端存储库。我们的模拟表明，主动索引比当前基于爬虫的技术效率高出30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An efficient scheme to remove crawler traffic from the Internet

We estimate that approximately 40% of current Internet traffic is due to Web crawlers retrieving pages for indexing. We address this problem by introducing an efficient indexing system based on active networks. Our approach employs strategically placed active routers that constantly monitor passing Internet traffic, analyze it, and then transmit the index data to a dedicated back-end repository. Our simulations have shown that active indexing is up to 30% more efficient than the current crawler-based techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. Eleventh International Conference on Computer Communications and Networks

自引率

0.00%

发文量