Detection and confirmation of web robot requests for cleaning the voluminous web log data

2014 International Conference on the IMpact of E-Technology on US (IMPETUS) Pub Date : 2014-03-20 DOI:10.1109/IMPETUS.2014.6775871

T. H. Sardar, Z. Ansari

{"title":"Detection and confirmation of web robot requests for cleaning the voluminous web log data","authors":"T. H. Sardar, Z. Ansari","doi":"10.1109/IMPETUS.2014.6775871","DOIUrl":null,"url":null,"abstract":"Web robots are software applications that run automated tasks over the internet. They traverse the hyperlink structure of the World Wide Web so that they can retrieve information. There are many reasons to distinguish web robot requests and user requests. Some tasks of web robots can be harmful to the web. Firstly, Web robots are employed for assemble business intelligence at e-commerce sites. In such a state of affairs, the e-commerce site may need to detect robots. Secondly, many e-commerce sites carry out Web traffic scrutiny to deduce the way their customers have accessed the site. Unfortunately, such scrutiny can be erroneous by the presence of Web robots. Thirdly, Web robots often consume considerable network bandwidth and server resources at the expense of other users. A web log file is a web server file automatically created and maintained by a web server to check the activity performed by it. It maintains a history of page requests on its site. In this paper we have used four methods together to detect and finally confirm requests as a robot request. Experiments have been performed on the log file generated from the server of an operational web site named vtulife.com which contains data of march-20l3. In our research results o.f web robot detection using various techniques have been compared and an integrated approach is proposed for the confirmation of the robot request.","PeriodicalId":153707,"journal":{"name":"2014 International Conference on the IMpact of E-Technology on US (IMPETUS)","volume":"83 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on the IMpact of E-Technology on US (IMPETUS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMPETUS.2014.6775871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Web robots are software applications that run automated tasks over the internet. They traverse the hyperlink structure of the World Wide Web so that they can retrieve information. There are many reasons to distinguish web robot requests and user requests. Some tasks of web robots can be harmful to the web. Firstly, Web robots are employed for assemble business intelligence at e-commerce sites. In such a state of affairs, the e-commerce site may need to detect robots. Secondly, many e-commerce sites carry out Web traffic scrutiny to deduce the way their customers have accessed the site. Unfortunately, such scrutiny can be erroneous by the presence of Web robots. Thirdly, Web robots often consume considerable network bandwidth and server resources at the expense of other users. A web log file is a web server file automatically created and maintained by a web server to check the activity performed by it. It maintains a history of page requests on its site. In this paper we have used four methods together to detect and finally confirm requests as a robot request. Experiments have been performed on the log file generated from the server of an operational web site named vtulife.com which contains data of march-20l3. In our research results o.f web robot detection using various techniques have been compared and an integrated approach is proposed for the confirmation of the robot request.

查看原文本刊更多论文

检测和确认网络机器人的请求，以清理大量的网络日志数据

网络机器人是在互联网上运行自动任务的软件应用程序。它们遍历万维网的超链接结构，以便检索信息。区分web机器人请求和用户请求的原因有很多。网络机器人的一些任务可能对网络有害。首先，利用网络机器人在电子商务网站组装商业智能。在这种情况下，电子商务网站可能需要检测机器人。其次，许多电子商务网站进行网络流量审查，以推断客户访问网站的方式。不幸的是，这种审查可能会因为网络机器人的存在而出错。第三，Web机器人经常以牺牲其他用户为代价，消耗大量的网络带宽和服务器资源。web日志文件是由web服务器自动创建和维护的web服务器文件，用于检查其执行的活动。它在其网站上保存着页面请求的历史记录。在本文中，我们使用了四种方法一起检测请求并最终确认请求为机器人请求。实验是在一个名为vtulife.com的运营网站的服务器上生成的日志文件上进行的，该文件包含了2013年3月的数据。在本文中，我们比较了各种网络机器人检测技术的研究结果，并提出了一种用于机器人请求确认的综合方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 International Conference on the IMpact of E-Technology on US (IMPETUS)

自引率

0.00%

发文量