{"title":"一个有效的方案,以消除爬虫流量从互联网","authors":"Xiaoqin Yuan, M. MacGregor, J. Harms","doi":"10.1109/ICCCN.2002.1043051","DOIUrl":null,"url":null,"abstract":"We estimate that approximately 40% of current Internet traffic is due to Web crawlers retrieving pages for indexing. We address this problem by introducing an efficient indexing system based on active networks. Our approach employs strategically placed active routers that constantly monitor passing Internet traffic, analyze it, and then transmit the index data to a dedicated back-end repository. Our simulations have shown that active indexing is up to 30% more efficient than the current crawler-based techniques.","PeriodicalId":302787,"journal":{"name":"Proceedings. Eleventh International Conference on Computer Communications and Networks","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"An efficient scheme to remove crawler traffic from the Internet\",\"authors\":\"Xiaoqin Yuan, M. MacGregor, J. Harms\",\"doi\":\"10.1109/ICCCN.2002.1043051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We estimate that approximately 40% of current Internet traffic is due to Web crawlers retrieving pages for indexing. We address this problem by introducing an efficient indexing system based on active networks. Our approach employs strategically placed active routers that constantly monitor passing Internet traffic, analyze it, and then transmit the index data to a dedicated back-end repository. Our simulations have shown that active indexing is up to 30% more efficient than the current crawler-based techniques.\",\"PeriodicalId\":302787,\"journal\":{\"name\":\"Proceedings. Eleventh International Conference on Computer Communications and Networks\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Eleventh International Conference on Computer Communications and Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCN.2002.1043051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Eleventh International Conference on Computer Communications and Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2002.1043051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An efficient scheme to remove crawler traffic from the Internet
We estimate that approximately 40% of current Internet traffic is due to Web crawlers retrieving pages for indexing. We address this problem by introducing an efficient indexing system based on active networks. Our approach employs strategically placed active routers that constantly monitor passing Internet traffic, analyze it, and then transmit the index data to a dedicated back-end repository. Our simulations have shown that active indexing is up to 30% more efficient than the current crawler-based techniques.