Pedro Marques, Zayani Dabbabi, Miruna-Mihaela Mironescu, Olivier Thonnard, A. Bessani, F. Buontempo, Ilir Gashi
{"title":"Detecting Malicious Web Scraping Activity: A Study with Diverse Detectors","authors":"Pedro Marques, Zayani Dabbabi, Miruna-Mihaela Mironescu, Olivier Thonnard, A. Bessani, F. Buontempo, Ilir Gashi","doi":"10.1109/PRDC.2018.00049","DOIUrl":null,"url":null,"abstract":"We present results on the use of diverse monitoring tools for the detection of malicious web scraping activity. We have carried out an analysis of a real dataset of Apache HTTP Access logs for an e-commerce application provided by a large multinational IT provider for the global travel and tourism industry. Two tools have been used to detect scraping activities based on the HTTP requests: a commercial tool, and an in-house tool called Arcane. We show the benefits that can be achieved through the use of both systems, in terms of overall sensitivity and specificity, and we discuss the potential sources of diversity between the tool's alert patterns.","PeriodicalId":409301,"journal":{"name":"2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRDC.2018.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We present results on the use of diverse monitoring tools for the detection of malicious web scraping activity. We have carried out an analysis of a real dataset of Apache HTTP Access logs for an e-commerce application provided by a large multinational IT provider for the global travel and tourism industry. Two tools have been used to detect scraping activities based on the HTTP requests: a commercial tool, and an in-house tool called Arcane. We show the benefits that can be achieved through the use of both systems, in terms of overall sensitivity and specificity, and we discuss the potential sources of diversity between the tool's alert patterns.