A Comprehensive Evaluation of HTTP Header Features for Detecting Malicious Websites

IV JohnMcGahagan, Darshan Bhansali, Margaret Gratian, M. Cukier
{"title":"A Comprehensive Evaluation of HTTP Header Features for Detecting Malicious Websites","authors":"IV JohnMcGahagan, Darshan Bhansali, Margaret Gratian, M. Cukier","doi":"10.1109/EDCC.2019.00025","DOIUrl":null,"url":null,"abstract":"Security researchers have used website features including the URL, webpage content, HTTP headers, and others to detect malicious websites. In prior research, features derived from HTTP headers have shown promise for malicious website detection. This paper includes a comprehensive evaluation of HTTP header features to assess whether additional HTTP header features improve malicious website detection. We analyze HTTP headers from 6,021 malicious and 39,853 benign websites. We define malicious websites as those identified by Cisco Talos Threat Intelligence Group for association with phishing, drive-by downloads, and command and control infrastructure. Benign websites consist of popular websites from the Alexa Traffic Rank. We collect 672 HTTP header features from these websites and identify 22 for further analysis. Among these, 11 have been studied in prior research while the other 11 are new and identified in our research. From these 22 features, eight features, three identified by our study, consistently rank as the most important features and represent 80% of the total feature importance. We build eight models with supervised learning techniques and observe that the detection performance metrics for the 22 features are consistently better than for the 11 previously studied features. We also apply two feature transformation techniques and find that performing Principal Component Analysis on the features identified increases detection ability. From our results, we postulate that use of additional HTTP header features will lead to more accurate detection of malicious websites.","PeriodicalId":334498,"journal":{"name":"2019 15th European Dependable Computing Conference (EDCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th European Dependable Computing Conference (EDCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDCC.2019.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Security researchers have used website features including the URL, webpage content, HTTP headers, and others to detect malicious websites. In prior research, features derived from HTTP headers have shown promise for malicious website detection. This paper includes a comprehensive evaluation of HTTP header features to assess whether additional HTTP header features improve malicious website detection. We analyze HTTP headers from 6,021 malicious and 39,853 benign websites. We define malicious websites as those identified by Cisco Talos Threat Intelligence Group for association with phishing, drive-by downloads, and command and control infrastructure. Benign websites consist of popular websites from the Alexa Traffic Rank. We collect 672 HTTP header features from these websites and identify 22 for further analysis. Among these, 11 have been studied in prior research while the other 11 are new and identified in our research. From these 22 features, eight features, three identified by our study, consistently rank as the most important features and represent 80% of the total feature importance. We build eight models with supervised learning techniques and observe that the detection performance metrics for the 22 features are consistently better than for the 11 previously studied features. We also apply two feature transformation techniques and find that performing Principal Component Analysis on the features identified increases detection ability. From our results, we postulate that use of additional HTTP header features will lead to more accurate detection of malicious websites.
用于检测恶意网站的HTTP头特征的综合评估
安全研究人员已经使用了包括URL、网页内容、HTTP标头等在内的网站特征来检测恶意网站。在之前的研究中,从HTTP报头派生的特征已经显示出恶意网站检测的希望。本文包括对HTTP报头特征的全面评估,以评估额外的HTTP报头特征是否能改善恶意网站检测。我们分析了6021个恶意网站和39853个良性网站的HTTP头。我们将恶意网站定义为由思科Talos威胁情报组识别的与网络钓鱼、驾车下载以及命令和控制基础设施相关的网站。良性网站包括受欢迎的网站从Alexa流量排名。我们从这些网站收集了672个HTTP头特征,并确定了22个进行进一步分析。其中11个是前人研究过的,另外11个是我们研究中新发现的。在这22个特征中,有8个特征,其中3个由我们的研究确定,始终被列为最重要的特征,占总特征重要性的80%。我们使用监督学习技术构建了8个模型,并观察到22个特征的检测性能指标始终优于之前研究的11个特征。我们还应用了两种特征转换技术,并发现对识别的特征执行主成分分析可以提高检测能力。从我们的结果来看,我们假设使用额外的HTTP标头功能将导致更准确地检测恶意网站。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信