{"title":"A density based clustering approach for web robot detection","authors":"M. Zabihi, M. V. Jahan, J. Hamidzadeh","doi":"10.1109/ICCKE.2014.6993362","DOIUrl":null,"url":null,"abstract":"Distinction between humans and Web robots, in terms of computer network security, has led to the robot detection problem. An exact solution for this issue can preserve Web sites from the intrusion of malicious robots and increase the performance of Web servers by prioritizing human users. In this article, we propose a density based method called DBC_WRD (Density Based Clustering for Web Robot Detection) to discover the traffic of Web robots on two large real data sets. So, we assume the visitors as the spatial instances and introduce two new features to describe and distinguish them. These attributes are based on the behavioral patterns of Web visitors and remain invariant over time. By focusing on one of the disadvantages of DBSCAN as the density based clustering algorithm used in this paper, we just utilize 4 features to reduce the dimensions. According to the supervised evaluations, DBC_WRD can have the 96% of Jaccard metric and produce two clusters which have the entropy and purity rates of 0.0215 and 0.97, respectively. Furthermore, the comparisons show that from the standpoint of clustering quality and accuracy, DBC_WRD performs better than state-of-the-art algorithms. Finally, it can be concluded that some non-malicious popular Web robots, through imitating the human's behavior, make it difficult to be identified.","PeriodicalId":152540,"journal":{"name":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2014.6993362","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Distinction between humans and Web robots, in terms of computer network security, has led to the robot detection problem. An exact solution for this issue can preserve Web sites from the intrusion of malicious robots and increase the performance of Web servers by prioritizing human users. In this article, we propose a density based method called DBC_WRD (Density Based Clustering for Web Robot Detection) to discover the traffic of Web robots on two large real data sets. So, we assume the visitors as the spatial instances and introduce two new features to describe and distinguish them. These attributes are based on the behavioral patterns of Web visitors and remain invariant over time. By focusing on one of the disadvantages of DBSCAN as the density based clustering algorithm used in this paper, we just utilize 4 features to reduce the dimensions. According to the supervised evaluations, DBC_WRD can have the 96% of Jaccard metric and produce two clusters which have the entropy and purity rates of 0.0215 and 0.97, respectively. Furthermore, the comparisons show that from the standpoint of clustering quality and accuracy, DBC_WRD performs better than state-of-the-art algorithms. Finally, it can be concluded that some non-malicious popular Web robots, through imitating the human's behavior, make it difficult to be identified.