Server-Side Prediction of Source IP Addresses Using Density Estimation

Markus Goldstein, Matthias Reif, A. Stahl, T. Breuel
{"title":"Server-Side Prediction of Source IP Addresses Using Density Estimation","authors":"Markus Goldstein, Matthias Reif, A. Stahl, T. Breuel","doi":"10.1109/ARES.2009.113","DOIUrl":null,"url":null,"abstract":"Source IP addresses are often used as a major feature for user modeling in computer networks. Particularly in the field of Distributed Denial of Service (DDoS) attack detection and mitigation traffic models make extensive use of source IP addresses for detecting anomalies. Typically the real IP address distribution is strongly undersampled due to a small amount of observations. Density estimation overcomes this shortage by taking advantage of IP neighborhood relations. In many cases simple models are implicitly used or chosen intuitively as a network based heuristic. In this paper we review and formalize existing models including a hierarchical clustering approach first. In addition, we present a modified k-means clustering algorithm for source IP density estimation as well as a statistical motivated smoothing approach using the Nadaraya-Watson kernel-weighted average. For performance evaluation we apply all methods on a 90 days real world dataset consisting of 1.3 million different source IP addresses and try to predict the users of the following next 10 days. ROC curves and an example DDoS mitigation scenario show that there is no uniformly better approach: k-means performs best when a high detection rate is needed whereas statistical smoothing works better for low false alarm rate requirements like the DDoS mitigation scenario.","PeriodicalId":169468,"journal":{"name":"2009 International Conference on Availability, Reliability and Security","volume":"239 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARES.2009.113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Source IP addresses are often used as a major feature for user modeling in computer networks. Particularly in the field of Distributed Denial of Service (DDoS) attack detection and mitigation traffic models make extensive use of source IP addresses for detecting anomalies. Typically the real IP address distribution is strongly undersampled due to a small amount of observations. Density estimation overcomes this shortage by taking advantage of IP neighborhood relations. In many cases simple models are implicitly used or chosen intuitively as a network based heuristic. In this paper we review and formalize existing models including a hierarchical clustering approach first. In addition, we present a modified k-means clustering algorithm for source IP density estimation as well as a statistical motivated smoothing approach using the Nadaraya-Watson kernel-weighted average. For performance evaluation we apply all methods on a 90 days real world dataset consisting of 1.3 million different source IP addresses and try to predict the users of the following next 10 days. ROC curves and an example DDoS mitigation scenario show that there is no uniformly better approach: k-means performs best when a high detection rate is needed whereas statistical smoothing works better for low false alarm rate requirements like the DDoS mitigation scenario.
使用密度估计源IP地址的服务器端预测
在计算机网络中,源IP地址经常被用作用户建模的主要特征。特别是在分布式拒绝服务(DDoS)攻击检测和缓解流量模型广泛使用源IP地址来检测异常。通常情况下,真实的IP地址分布由于观测量少而严重欠采样。密度估计利用IP邻域关系克服了这一不足。在许多情况下,简单的模型被隐式地使用或直观地选择为基于网络的启发式。在本文中,我们首先回顾和形式化现有的模型,包括层次聚类方法。此外,我们提出了一种改进的k-means聚类算法用于源IP密度估计,以及使用Nadaraya-Watson核加权平均的统计动机平滑方法。为了进行性能评估,我们将所有方法应用于由130万个不同源IP地址组成的90天真实世界数据集,并尝试预测接下来10天的用户。ROC曲线和一个示例DDoS缓解场景表明,没有统一的更好的方法:k-means在需要高检测率时表现最好,而统计平滑在像DDoS缓解场景这样的低误报率要求下效果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信