网络流量异常检测与预测的时间序列数据集。

IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Josef Koumar, Karel Hynek, Tomáš Čejka, Pavel Šiška
{"title":"网络流量异常检测与预测的时间序列数据集。","authors":"Josef Koumar, Karel Hynek, Tomáš Čejka, Pavel Šiška","doi":"10.1038/s41597-025-04603-x","DOIUrl":null,"url":null,"abstract":"<p><p>Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network-an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"338"},"PeriodicalIF":6.9000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11865510/pdf/","citationCount":"0","resultStr":"{\"title\":\"CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting.\",\"authors\":\"Josef Koumar, Karel Hynek, Tomáš Čejka, Pavel Šiška\",\"doi\":\"10.1038/s41597-025-04603-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network-an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.</p>\",\"PeriodicalId\":21597,\"journal\":{\"name\":\"Scientific Data\",\"volume\":\"12 1\",\"pages\":\"338\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11865510/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Data\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41597-025-04603-x\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04603-x","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

网络流量异常检测对于维护计算机网络安全、识别恶意活动至关重要。大多数异常检测方法使用基于预测的方法。预测和异常检测技术的大量真实网络数据集缺失,可能导致对异常检测算法性能的高估,并制造进步的错觉。本文通过引入一个综合数据集来解决这个问题,该数据集来自CESNET3网络(一个每天为大约50万客户提供服务的ISP网络)中275,000个活跃IP地址传输的40周流量。它捕获不同网络实体的行为,反映ISP环境的典型可变性。这种可变性为开发预测和异常检测模型提供了一个现实的、具有挑战性的环境,使评估更接近真实的部署场景。它为基于预测的异常检测方法的实际部署提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting.

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting.

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting.

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting.

Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. Most approaches to anomaly detection use methods based on forecasting. Extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing overestimation of anomaly detection algorithm performance and fabricating the illusion of progress. This manuscript tackles this issue by introducing a comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network-an ISP network serving approximately half a million customers daily. It captures the behavior of diverse network entities, reflecting the variability typical of an ISP environment. This variability provides a realistic and challenging environment for developing forecasting and anomaly detection models, enabling evaluations that are closer to real-world deployment scenarios. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Scientific Data
Scientific Data Social Sciences-Education
CiteScore
11.20
自引率
4.10%
发文量
689
审稿时长
16 weeks
期刊介绍: Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信