An extensive study of Web robots traffic

M. Calzarossa, L. Massari, D. Tessera
{"title":"An extensive study of Web robots traffic","authors":"M. Calzarossa, L. Massari, D. Tessera","doi":"10.1145/2539150.2539161","DOIUrl":null,"url":null,"abstract":"The traffic produced by the periodic crawling activities of Web robots often represents a good fraction of the overall websites traffic, thus causing some non-negligible effects on their performance. Our study focuses on the traffic generated on the SPEC website by many different Web robots, including, among the others, the robots employed by some popular search engines. This extensive investigation shows that the behavior and crawling patterns of the robots vary significantly in terms of requests, resources and clients involved in their crawling activities. Some robots tend to concentrate their requests in short periods of time and follow some sorts of deterministic patterns characterized by multiple peaks. The requests of other robots exhibit a time dependent behavior and repeated patterns with some periodicity. We represent the traffic as a time series modelled in the frequency domain. The identified models, consisting of trigonometric polynomials and Auto Regressive Moving Average components, accurately summarize the behavior of the overall traffic as well as the traffic of individual robots. These models can be easily used as a basis for forecasting.","PeriodicalId":424918,"journal":{"name":"International Conference on Information Integration and Web-based Applications & Services","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2539150.2539161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The traffic produced by the periodic crawling activities of Web robots often represents a good fraction of the overall websites traffic, thus causing some non-negligible effects on their performance. Our study focuses on the traffic generated on the SPEC website by many different Web robots, including, among the others, the robots employed by some popular search engines. This extensive investigation shows that the behavior and crawling patterns of the robots vary significantly in terms of requests, resources and clients involved in their crawling activities. Some robots tend to concentrate their requests in short periods of time and follow some sorts of deterministic patterns characterized by multiple peaks. The requests of other robots exhibit a time dependent behavior and repeated patterns with some periodicity. We represent the traffic as a time series modelled in the frequency domain. The identified models, consisting of trigonometric polynomials and Auto Regressive Moving Average components, accurately summarize the behavior of the overall traffic as well as the traffic of individual robots. These models can be easily used as a basis for forecasting.
对网络机器人流量的广泛研究
Web机器人的周期性抓取活动产生的流量通常占整个网站流量的很大一部分,因此对其性能产生了一些不可忽视的影响。我们的研究集中于许多不同的网络机器人在SPEC网站上产生的流量,其中包括一些流行搜索引擎所使用的机器人。这项广泛的调查表明,机器人的行为和爬行模式在涉及其爬行活动的请求、资源和客户端方面差异很大。一些机器人倾向于在短时间内集中他们的请求,并遵循一些以多个峰值为特征的确定性模式。其他机器人的请求行为具有时间依赖性,且具有一定的周期性。我们将流量表示为在频域建模的时间序列。所识别的模型由三角多项式和自动回归移动平均分量组成,准确地总结了整体交通和单个机器人的交通行为。这些模型可以很容易地作为预测的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信