How to Count Bots in Longitudinal Datasets of IP Addresses

Leon Böck, Dave Levin, Ramakrishna Padmanabhan, C. Doerr, M. Mühlhäuser, Telecooperation Lab
{"title":"How to Count Bots in Longitudinal Datasets of IP Addresses","authors":"Leon Böck, Dave Levin, Ramakrishna Padmanabhan, C. Doerr, M. Mühlhäuser, Telecooperation Lab","doi":"10.14722/ndss.2023.24002","DOIUrl":null,"url":null,"abstract":"—Estimating the size of a botnet is one of the most basic and important queries one can make when trying to understand the impact of a botnet. Surprisingly and unfortunately, this seemingly simple task has confounded many measurement efforts. While it may seem tempting to simply count the number of IP addresses observed to be infected, it is well-known that doing so can lead to drastic overestimates, as ISPs commonly assign new IP addresses to hosts. As a result, estimating the number of infected hosts given longitudinal datasets of IP addresses has remained an open problem. In this paper, we present a new data analysis technique, CARDCount , that provides more accurate size estimations by accounting for IP address reassignments. CARDCount can be applied on longer windows of observations than prior approaches (weeks compared to hours), and is the first technique of its kind to provide confidence intervals for its size estimations. We evaluate CARDCount on three real world datasets and show that it performs equally well to existing solutions on synthetic ideal situations, but drastically outperforms all previous work in realistic botnet situations. For the Hajime and Mirai botnets, we estimate that CARDCount, is 51.6% and 69.1% more accurate than the state of the art techniques when estimating the botnet size over a 28-day window.","PeriodicalId":199733,"journal":{"name":"Proceedings 2023 Network and Distributed System Security Symposium","volume":"05 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2023.24002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

—Estimating the size of a botnet is one of the most basic and important queries one can make when trying to understand the impact of a botnet. Surprisingly and unfortunately, this seemingly simple task has confounded many measurement efforts. While it may seem tempting to simply count the number of IP addresses observed to be infected, it is well-known that doing so can lead to drastic overestimates, as ISPs commonly assign new IP addresses to hosts. As a result, estimating the number of infected hosts given longitudinal datasets of IP addresses has remained an open problem. In this paper, we present a new data analysis technique, CARDCount , that provides more accurate size estimations by accounting for IP address reassignments. CARDCount can be applied on longer windows of observations than prior approaches (weeks compared to hours), and is the first technique of its kind to provide confidence intervals for its size estimations. We evaluate CARDCount on three real world datasets and show that it performs equally well to existing solutions on synthetic ideal situations, but drastically outperforms all previous work in realistic botnet situations. For the Hajime and Mirai botnets, we estimate that CARDCount, is 51.6% and 69.1% more accurate than the state of the art techniques when estimating the botnet size over a 28-day window.
如何在IP地址的纵向数据集中计算机器人
估计僵尸网络的大小是一个人在试图了解僵尸网络的影响时可以做的最基本和最重要的查询之一。令人惊讶和不幸的是,这个看似简单的任务混淆了许多度量工作。虽然简单地计算观察到的受感染IP地址的数量似乎很诱人,但众所周知,这样做可能导致严重的高估,因为isp通常会为主机分配新的IP地址。因此,在给定IP地址纵向数据集的情况下,估计受感染主机的数量仍然是一个悬而未决的问题。在本文中,我们提出了一种新的数据分析技术,CARDCount,它通过考虑IP地址重新分配来提供更准确的大小估计。与以前的方法相比,CARDCount可以应用于更长的观察窗口(数周与数小时相比),并且是同类技术中第一个为其大小估计提供置信区间的技术。我们在三个真实世界的数据集上评估CARDCount,并表明它在合成理想情况下的表现与现有解决方案一样好,但在现实僵尸网络情况下,它的表现大大优于以前的所有工作。对于Hajime和Mirai僵尸网络,我们估计CARDCount在估计28天内僵尸网络大小时比最先进的技术准确51.6%和69.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信