{"title":"An efficient method in pre-processing phase of mining suspicious web crawlers","authors":"Miron Catalin, Aflori Cristian","doi":"10.1109/ICSTCC.2017.8107046","DOIUrl":null,"url":null,"abstract":"The reports from last years outline the fact that the web crawlers (robots, bots) activities generate more than a half of web traffic on Internet. Web robots can be good (used for example by search engines) or bad (for bypassing security solutions, scraping, spamming or hacking), but usually all take up the internet bandwidth and can cause damage to businesses that rely on web traffic or content. Sorting human online traffic from bot activity isn't an easy task. The constantly evolving range of attacks, and the continuous optimization of bots, pose a new set of challenges. Our proposal is the first step for a larger automated solution that implies using the various intrusion detection system (IDS) methods and tools combined with mining algorithms. The final objective of the solution is the automated and effective detection of real bad bot threats for taking the appropriate security measures. The method proposes an automated flow from capturing the network traffic to the extraction of the input data to the mining algorithms (as the pre-processing step) and also an initial pattern detection and visualization with the scope of identifying potential threats generated by suspicious web crawlers. The first results are encouraging and represent the initial phase of identifying the potential threats from bad web robots.","PeriodicalId":374572,"journal":{"name":"2017 21st International Conference on System Theory, Control and Computing (ICSTCC)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 21st International Conference on System Theory, Control and Computing (ICSTCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTCC.2017.8107046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The reports from last years outline the fact that the web crawlers (robots, bots) activities generate more than a half of web traffic on Internet. Web robots can be good (used for example by search engines) or bad (for bypassing security solutions, scraping, spamming or hacking), but usually all take up the internet bandwidth and can cause damage to businesses that rely on web traffic or content. Sorting human online traffic from bot activity isn't an easy task. The constantly evolving range of attacks, and the continuous optimization of bots, pose a new set of challenges. Our proposal is the first step for a larger automated solution that implies using the various intrusion detection system (IDS) methods and tools combined with mining algorithms. The final objective of the solution is the automated and effective detection of real bad bot threats for taking the appropriate security measures. The method proposes an automated flow from capturing the network traffic to the extraction of the input data to the mining algorithms (as the pre-processing step) and also an initial pattern detection and visualization with the scope of identifying potential threats generated by suspicious web crawlers. The first results are encouraging and represent the initial phase of identifying the potential threats from bad web robots.