A Comprehensive Evaluation of HTTP Header Features for Detecting Malicious Websites

2019 15th European Dependable Computing Conference (EDCC) Pub Date : 2019-09-01 DOI:10.1109/EDCC.2019.00025

IV JohnMcGahagan, Darshan Bhansali, Margaret Gratian, M. Cukier

{"title":"A Comprehensive Evaluation of HTTP Header Features for Detecting Malicious Websites","authors":"IV JohnMcGahagan, Darshan Bhansali, Margaret Gratian, M. Cukier","doi":"10.1109/EDCC.2019.00025","DOIUrl":null,"url":null,"abstract":"Security researchers have used website features including the URL, webpage content, HTTP headers, and others to detect malicious websites. In prior research, features derived from HTTP headers have shown promise for malicious website detection. This paper includes a comprehensive evaluation of HTTP header features to assess whether additional HTTP header features improve malicious website detection. We analyze HTTP headers from 6,021 malicious and 39,853 benign websites. We define malicious websites as those identified by Cisco Talos Threat Intelligence Group for association with phishing, drive-by downloads, and command and control infrastructure. Benign websites consist of popular websites from the Alexa Traffic Rank. We collect 672 HTTP header features from these websites and identify 22 for further analysis. Among these, 11 have been studied in prior research while the other 11 are new and identified in our research. From these 22 features, eight features, three identified by our study, consistently rank as the most important features and represent 80% of the total feature importance. We build eight models with supervised learning techniques and observe that the detection performance metrics for the 22 features are consistently better than for the 11 previously studied features. We also apply two feature transformation techniques and find that performing Principal Component Analysis on the features identified increases detection ability. From our results, we postulate that use of additional HTTP header features will lead to more accurate detection of malicious websites.","PeriodicalId":334498,"journal":{"name":"2019 15th European Dependable Computing Conference (EDCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th European Dependable Computing Conference (EDCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDCC.2019.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Security researchers have used website features including the URL, webpage content, HTTP headers, and others to detect malicious websites. In prior research, features derived from HTTP headers have shown promise for malicious website detection. This paper includes a comprehensive evaluation of HTTP header features to assess whether additional HTTP header features improve malicious website detection. We analyze HTTP headers from 6,021 malicious and 39,853 benign websites. We define malicious websites as those identified by Cisco Talos Threat Intelligence Group for association with phishing, drive-by downloads, and command and control infrastructure. Benign websites consist of popular websites from the Alexa Traffic Rank. We collect 672 HTTP header features from these websites and identify 22 for further analysis. Among these, 11 have been studied in prior research while the other 11 are new and identified in our research. From these 22 features, eight features, three identified by our study, consistently rank as the most important features and represent 80% of the total feature importance. We build eight models with supervised learning techniques and observe that the detection performance metrics for the 22 features are consistently better than for the 11 previously studied features. We also apply two feature transformation techniques and find that performing Principal Component Analysis on the features identified increases detection ability. From our results, we postulate that use of additional HTTP header features will lead to more accurate detection of malicious websites.

查看原文本刊更多论文

用于检测恶意网站的HTTP头特征的综合评估

安全研究人员已经使用了包括URL、网页内容、HTTP标头等在内的网站特征来检测恶意网站。在之前的研究中，从HTTP报头派生的特征已经显示出恶意网站检测的希望。本文包括对HTTP报头特征的全面评估，以评估额外的HTTP报头特征是否能改善恶意网站检测。我们分析了6021个恶意网站和39853个良性网站的HTTP头。我们将恶意网站定义为由思科Talos威胁情报组识别的与网络钓鱼、驾车下载以及命令和控制基础设施相关的网站。良性网站包括受欢迎的网站从Alexa流量排名。我们从这些网站收集了672个HTTP头特征，并确定了22个进行进一步分析。其中11个是前人研究过的，另外11个是我们研究中新发现的。在这22个特征中，有8个特征，其中3个由我们的研究确定，始终被列为最重要的特征，占总特征重要性的80%。我们使用监督学习技术构建了8个模型，并观察到22个特征的检测性能指标始终优于之前研究的11个特征。我们还应用了两种特征转换技术，并发现对识别的特征执行主成分分析可以提高检测能力。从我们的结果来看，我们假设使用额外的HTTP标头功能将导致更准确地检测恶意网站。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 15th European Dependable Computing Conference (EDCC)

自引率

0.00%

发文量