How to Count Bots in Longitudinal Datasets of IP Addresses

Proceedings 2023 Network and Distributed System Security Symposium Pub Date : 1900-01-01 DOI:10.14722/ndss.2023.24002

Leon Böck, Dave Levin, Ramakrishna Padmanabhan, C. Doerr, M. Mühlhäuser, Telecooperation Lab

{"title":"How to Count Bots in Longitudinal Datasets of IP Addresses","authors":"Leon Böck, Dave Levin, Ramakrishna Padmanabhan, C. Doerr, M. Mühlhäuser, Telecooperation Lab","doi":"10.14722/ndss.2023.24002","DOIUrl":null,"url":null,"abstract":"—Estimating the size of a botnet is one of the most basic and important queries one can make when trying to understand the impact of a botnet. Surprisingly and unfortunately, this seemingly simple task has confounded many measurement efforts. While it may seem tempting to simply count the number of IP addresses observed to be infected, it is well-known that doing so can lead to drastic overestimates, as ISPs commonly assign new IP addresses to hosts. As a result, estimating the number of infected hosts given longitudinal datasets of IP addresses has remained an open problem. In this paper, we present a new data analysis technique, CARDCount , that provides more accurate size estimations by accounting for IP address reassignments. CARDCount can be applied on longer windows of observations than prior approaches (weeks compared to hours), and is the ﬁrst technique of its kind to provide conﬁdence intervals for its size estimations. We evaluate CARDCount on three real world datasets and show that it performs equally well to existing solutions on synthetic ideal situations, but drastically outperforms all previous work in realistic botnet situations. For the Hajime and Mirai botnets, we estimate that CARDCount, is 51.6% and 69.1% more accurate than the state of the art techniques when estimating the botnet size over a 28-day window.","PeriodicalId":199733,"journal":{"name":"Proceedings 2023 Network and Distributed System Security Symposium","volume":"05 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2023.24002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

—Estimating the size of a botnet is one of the most basic and important queries one can make when trying to understand the impact of a botnet. Surprisingly and unfortunately, this seemingly simple task has confounded many measurement efforts. While it may seem tempting to simply count the number of IP addresses observed to be infected, it is well-known that doing so can lead to drastic overestimates, as ISPs commonly assign new IP addresses to hosts. As a result, estimating the number of infected hosts given longitudinal datasets of IP addresses has remained an open problem. In this paper, we present a new data analysis technique, CARDCount , that provides more accurate size estimations by accounting for IP address reassignments. CARDCount can be applied on longer windows of observations than prior approaches (weeks compared to hours), and is the ﬁrst technique of its kind to provide conﬁdence intervals for its size estimations. We evaluate CARDCount on three real world datasets and show that it performs equally well to existing solutions on synthetic ideal situations, but drastically outperforms all previous work in realistic botnet situations. For the Hajime and Mirai botnets, we estimate that CARDCount, is 51.6% and 69.1% more accurate than the state of the art techniques when estimating the botnet size over a 28-day window.

查看原文本刊更多论文

如何在IP地址的纵向数据集中计算机器人

估计僵尸网络的大小是一个人在试图了解僵尸网络的影响时可以做的最基本和最重要的查询之一。令人惊讶和不幸的是，这个看似简单的任务混淆了许多度量工作。虽然简单地计算观察到的受感染IP地址的数量似乎很诱人，但众所周知，这样做可能导致严重的高估，因为isp通常会为主机分配新的IP地址。因此，在给定IP地址纵向数据集的情况下，估计受感染主机的数量仍然是一个悬而未决的问题。在本文中，我们提出了一种新的数据分析技术，CARDCount，它通过考虑IP地址重新分配来提供更准确的大小估计。与以前的方法相比，CARDCount可以应用于更长的观察窗口(数周与数小时相比)，并且是同类技术中第一个为其大小估计提供置信区间的技术。我们在三个真实世界的数据集上评估CARDCount，并表明它在合成理想情况下的表现与现有解决方案一样好，但在现实僵尸网络情况下，它的表现大大优于以前的所有工作。对于Hajime和Mirai僵尸网络，我们估计CARDCount在估计28天内僵尸网络大小时比最先进的技术准确51.6%和69.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 2023 Network and Distributed System Security Symposium

自引率

0.00%

发文量